Solutions Architect, AI Infrastructure

Remote from: Canada
Salary, yearly, CAD: 135,000 - 185,000
Department: Software Engineering
Employment type: Full Time,
Job posted: 11 May 2026
Apply before: 11 Jun 2026
Experience level: Senior
Views / Applies: 161 / 38

About company

About NVIDIA

NVIDIA is a leader in AI computing and graphics technology.

Computer Hardware
1993

Actively Hiring

AI Summary

NVIDIA seeks an experienced Solutions Architect to lead the design and deployment of large-scale AI infrastructure for cloud partners in Canada. This role involves guiding customers on network design, compute/storage, and cluster deployments, while acting as a trusted advisor throughout the customer lifecycle. The ideal candidate has 5+ years of solution engineering experience, deep knowledge of GPU servers, Linux, networking (Ethernet/InfiniBand), and DevOps/MLOps tools like Kubernetes. This position offers the opportunity to work with cutting-edge AI technology and collaborate with cross-functional teams at a leading tech company.

Job Complexity

Easy Hard

AI Insight This role requires a high level of technical expertise in GPU infrastructure, networking, and customer-facing skills, making it challenging. However, the requirements are typical for senior solutions architects, so it is not the hardest level.

Salary Analysis

AI Insight The offered salary range of CAD 135,000 to 185,000 (median ~160,000 CAD) is competitive for a Solutions Architect in Canada, aligning with market rates for senior roles in AI infrastructure. The US market median for similar roles is around $160,000 USD, but given the Canadian location, the offer is reasonable.

Key Skills

AI Infrastructure GPU Computing Solution Architecture Networking Kubernetes Linux DevOps Infiniband Customer Engagement Technical Leadership

Cover Letter Sample

Dear Hiring Manager,

I am writing to express my strong interest in the Solutions Architect, AI Infrastructure position at NVIDIA. With over 5 years of experience in solution engineering and a deep background in GPU computing, networking, and DevOps, I am confident in my ability to drive the design and deployment of large-scale AI infrastructure for your cloud partners.

In my previous role, I successfully led the architecture and deployment of high-performance GPU clusters, optimizing network performance with InfiniBand and Ethernet. I have hands-on experience with Kubernetes, Docker, and system-level debugging, which aligns with your requirements. I am passionate about AI and eager to contribute to NVIDIA's mission of transforming computing.

Thank you for considering my application. I look forward to the opportunity to discuss how my skills can benefit your team.

Sincerely,
[Your Name]

Possible Interview Questions

Can you describe your experience designing and deploying large-scale GPU clusters? What were the key challenges and how did you overcome them?

In my previous role, I led the deployment of a 1000-GPU cluster for an AI research lab. Key challenges included network congestion and thermal management. We overcame these by using InfiniBand with adaptive routing and implementing liquid cooling solutions. I also coordinated with vendors to optimize firmware settings.

How would you troubleshoot a performance issue in a GPU cluster involving both compute nodes and the network?

I would start by isolating the bottleneck using tools like NCCL tests and monitoring GPU utilization, then check network latency with ibdiagnet. If the issue is network-related, I would examine switch configurations and cable errors. For compute, I would review kernel logs and driver versions. Finally, I would correlate metrics from both sides to pinpoint the root cause.

Explain how you would guide a customer through the design of an AI data center network, considering both Ethernet and InfiniBand options.

I would begin by understanding their workload requirements—e.g., training vs. inference, data size, and latency sensitivity. For training, I'd recommend InfiniBand for low latency. I'd design a fat-tree topology with appropriate oversubscription, ensuring enough bandwidth for GPU communication. I'd also discuss power and cooling constraints, and provide a reference architecture with redundancy.

Describe a time when you had to manage multiple customer projects simultaneously. How did you prioritize and ensure successful outcomes?

I used a project management tool to track milestones and deadlines. I held weekly status meetings with each customer and communicated proactively about any delays. For urgent issues, I escalated to internal teams. I also delegated tasks to junior engineers when appropriate, ensuring I could focus on critical design decisions.

How do you stay current with the latest AI infrastructure technologies, and how would you incorporate new NVIDIA technologies into customer solutions?

I regularly attend NVIDIA GTC conferences, read technical blogs, and participate in internal training. When a new technology like the latest GPU or networking switch is released, I evaluate its benefits and limitations. I then create updated reference architectures and present them to customers, highlighting performance gains and compatibility considerations.

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

NVIDIA is seeking an experienced AI Infrastructure Solutions Architect (SA), bridging design to deployment of large-scale GPU infrastructure. As part of the NVIDIA SA organization, you will be interacting with customers, partners, and internal teams to analyse, define, and implement large-scale AI/HPC projects, as well as offering recommendations to business and engineering teams on our product roadmap.

What you’ll be doing:

Working with NVIDIA Cloud Partners in Canada on large data center GPU server and networking system deployments. Guide customer discussions on network design, compute/storage, and support bring up of server/network/cluster deployments. You will need to visit customer data center during bring up phase.
Become the primary technical driver for customers during the design, development, construction, integration, and production of GPU Cloud infrastructure and applications throughout the entire customer lifecycle.
Work as the customer’s trusted advisor conducting regular technical customer meetings for product roadmap, cluster issue debugging, feature discussions and introduction to new technology solutions.
Partner with other SAs, Account Managers, Engineering, Product, and business leaders to align on strategies, assess technical needs, and secure business opportunities for NVIDIA.
Analyze and debug compute/network configuration and performance issues to deliver performant clusters.
Prepare and deliver technical content to customers including presentations, workshops, reference architectures, tutorials, publications.

What we need to see:

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, Mathematics, or other Engineering fields or equivalent experience.
5+ years of Solution Engineering (or similar Sales Engineering, Cloud Engineering, Solution Architecture) including experience working directly with partners and customers.
System level expertise of CPU/GPU server architecture, NICs, Linux, system software and kernel drivers.
Experience with networking switches for Ethernet/Infiniband, and Data Center infrastructure (power/cooling).
Knowledge of DevOps/MLOps technologies such as Docker/containers, Kubernetes.
Efficient time management and capable of balancing multiple tasks. Excellent presentation, communication and collaboration skills.
Self-starter with a passion for growth, continuous learning, and sharing insights.

Ways to stand out from the crowd:

Familiarity with NVIDIA GPUs, NVIDIA Networking technologies (e.g. NICs, RoCE, InfiniBand), and systems technology such as NCCL, DCGM, UFM, Mission Control, and Base Command Manager.
Experience with bringup and deployment of large GPU clusters, including deploying and optimizing high-speed networks (InfiniBand/Ethernet), with a clear understanding of how network architecture impacts GPU cluster performance.
Systems engineering, coding, and debugging skills including experience with C/C++, Linux kernel and drivers.
Experience working with enterprise developers and strong customer-facing skills.

We make extensive use of conferencing tools, but occasional travel is required for on-site visit to customers and industry events. We have some of the most forward-thinking and hardworking people in the world working for us. If you’re creative and autonomous, we want to hear from you!

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 135,000 CAD – 185,000 CAD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until May 15, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

Apply now >