Solutions Architect - AI Factory
Run:ai
NVIDIA is seeking an outstanding Solutions Architect for AI Factory to assist and support customers that are building solutions with our newest AI technology. At NVIDIA, our Solutions Architects work across different teams and enjoy helping customers with the latest Accelerated Computing and Deep Learning software and hardware platforms. We're looking to grow our company, and build our teams with the smartest people in the world. Would you like to join us at the forefront of technological advancement?
You will become a trusted technical advisor with our customers and work on exciting projects and proof-of-concepts passionate about AI Factory. This role is an excellent opportunity to work in an interdisciplinary team with the latest technologies at NVIDIA!
What you will be doing:
Maintain an up-to-date understanding of the philosophy, architecture, and deployment methods of various evolving NVIDIA Reference Architectures—e.g., NVIDIA DGX SuperPOD Reference Architecture, NVIDIA Cloud Partner Reference Architecture, and NVIDIA Enterprise Reference Architecture.
Analyze and understand the requirements of customer-initiated AI training or inference clusters.
Identify the NVIDIA Reference Architecture that best matches customer needs and effectively communicate its value proposition to collaborators.
Facilitate seamless communication between NVIDIA's internal deployment teams and customers during the implementation of AI clusters based on Reference Architectures.
Provide hands-on technical support to developers after the AI Factory has been deployed, ensuring that AI training and inference workloads run effectively on the infrastructure.
What we need to see:
Bachelor’s degree or higher in Computer Science, Computer Engineering, or a related technical field.
Solid understanding of basic principles behind cluster orchestration, such as compute resource provisioning and dynamic prioritization based on user demand.
Minimum of 3 years of hands-on experience operating AI training or inference clusters that leverage Kubernetes with NVIDIA GPUs.
Proficiency in key technologies including: Container Runtime Interface (CRI), Container Network Interface (CNI), Calico, NVIDIA GPU Operator, NVIDIA Network Operator, and Kubeflow Training Operator.
Ways to stand out from the crowd:
Foundational knowledge and experience with network technologies—such as InfiniBand and Ethernet—in AI cluster environments, including compute fabric interconnects between GPU servers, storage fabric integration, and in-band networks for system administration.
Familiarity with the role of storage in AI training/inference clusters, including hands-on experience with vector databases and leading commercial storage solutions.
Experience integrating MLOps platforms into Kubernetes environments, such as deploying Airflow for orchestrating distributed training workloads.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, passionate and self-motivated, we want to hear from you! NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.