1 週前發布

Manager - Network (Ref# MN250903W)

HONG KONG CYBERPORT MANAGEMENT CO LTD

職位描述

Key Responsibilities:

  • Report to Head of Information and Communication Technology (ICT).
  • Design and build the best-in-class technical infrastructure for delivering AI supercomputing systems according to the required technical architecture, performance, resilience, security, energy efficiency and service levels.
  • Configure, operate and maintain the network and communication infrastructure of the AI supercomputing systems.
  • Develop and maintain network and communication infrastructure policy, standards and procedures of AI supercomputing systems to align with industry standards, frameworks and good practices.
  • Collaborate with network and communication infrastructure suppliers to optimise AI frameworks and their operational environment and libraries to maximise performance and effectiveness for adopting the AI supercomputing systems.
  • Manage technical infrastructure outsourcing services and their contractual agreements, implement effective measures to improve effectiveness, compliance and quality of the service delivery, manage the relationship with the outsourcing partners.
  • Define network and communication infrastructure technical requirements and interconnect capabilities to optimise for the operations and performance of AI supercomputing systems.
  • Conduct performance analysis, benchmarking and modeling to identify performance bottlenecks, optimise system parameters and guide the technical infrastructure enhancements.
  • Evaluate emerging AI supercomputing technologies, including GPU processor, network fabrics, storage and interconnects for the continuous advancement of AI supercomputing systems based on technical requirements, performance characteristics and cost considerations.
  • Collaborate with subject domain experts to understand the specific requirements of scientific research and data-intensive workloads for using the AI supercomputing systems and propose appropriate technical infrastructure enhancements to manage the workloads.
  • Provide necessary support to conduct risk assessment, evaluate infrastructure control effectiveness and mitigate associated risks.
  • Monitor and analyse technical infrastructure events and alerts; identify and respond to infrastructure related risks, incidents and breaches.
  • Stay up-to-date with the latest advancements in AI supercomputing hardware, software and industry trends to guide future infrastructure design and technology adoption.
  • Provide appropriate technical guidance, training and support for effective use of the AI supercomputing systems.
  • Prepare management information, key matrices and reports for continuous improvement.

Requirements:

  • 5+ year proven experience as network and communication infrastructure specialist, preferably in AI supercomputing environment.
  • Experience in AI supercomputing network and communication infrastructure design and implementation.
  • Experience in designing and optimising AI supercomputing infrastructure and systems for business, scientific, research, or data-intensive applications.
  • Experience in adopting hardware acceleration technologies such as GPU and NPU.
  • Understanding of performance analysis and optimization techniques for parallel computing, including profiling, tracing, and performance counters.
  • Familiarity with industry-standard interconnects and network fabrics and their impact on performance of AI supercomputing systems.
  • Knowledge of parallel programming models and frameworks and their application to AI supercomputing workloads.
  • Understanding of AI supercomputing software stack components, such as compilers, runtime systems, job schedulers, and development libraries.
  • Understanding of deep learning framework such as TensorFlow, PyTorch, Caffee.
  • Understanding of programming languages such as Python, Java, C++, R and CUDA for building and implementing AI systems.
  • Good problem-solving abilities and the ability to analyse and address complex performance and scalability challenges.
  • Ability to adapt to a fast-paced and rapidly evolving technological landscape.
  • Strong communication and collaboration skills to work effectively with cross-functional teams and subject domain experts.
  • Proficiency in written and spoken English and Chinese.
  • Passion with AI technical architecture, infrastructure and systems.

其他細節

職位空缺來源
CPjobs
參考編號
Joo-4229595
發布日期
2025年11月28日
關鍵詞
Information Technology
為方便用戶,人才服務辦公室網站提供由其他網站整合的職位空缺資訊及相關連結。本網站對這些網站提供的内容不負有任何責任。