Engineering Manager, Ads ML Efficiency

Remote from: USA
Salary, yearly, USD: 230,000 - 322,000
Department: Software Engineering
Employment type: Full Time,
Job posted: 23 Jun 2026
Apply before: 23 Jul 2026
Experience level: Midweight
Views / Applies: 68 / 27

About company

About Reddit

Dive into anything

Internet
2005

Actively Hiring

AI Summary

Reddit is seeking an Engineering Manager to lead a new Ads ML Efficiency team focused on making model training and inference faster, cheaper, and more scalable. The role combines technical leadership in ML optimization with people management, requiring deep expertise in distributed systems and GPU computing. The manager will define the roadmap, deliver measurable efficiency wins, and build tooling for profiling, load testing, and cost analysis. This position is highly collaborative, working closely with ranking, ML platform, and serving teams. Ideal candidates have hands-on optimization experience and a strong background in ads ranking or recommender systems.

Role DNA

Job Complexity

Easy Hard

Pace & Pressure

Relaxed Fast-paced

Autonomy Level

Guided Full Ownership

Communication Load

Independent Highly Collaborative

AI Insight The role requires deep technical expertise in ML optimization and distributed systems, combined with strong managerial and cross-functional leadership skills, making it highly challenging but not the hardest level.

Salary Analysis

AI Insight The offered salary range of $230,000 to $322,000 is competitive for this Engineering Manager role in Ads ML at a major tech hub like San Francisco, aligning well with market standards for similar positions.

Key Skills

Machine Learning Engineering Management Model Optimization GPU Computing Distributed Systems PyTorch Ads Ranking Performance Tuning Cross-functional Leadership Python

Cover Letter Sample

Dear Hiring Manager,

I am writing to express my strong interest in the Engineering Manager, Ads ML Efficiency position at Reddit. With a deep background in machine learning engineering and a proven track record of leading high-performance teams, I am excited about the opportunity to drive efficiency improvements in model training and inference. My experience optimizing GPU utilization and building scalable distributed systems directly aligns with the needs of this role.

At my previous company, I successfully reduced model training costs by 30% through profiling and optimization while maintaining service reliability. I thrive in cross-functional environments and have a passion for building tooling that empowers teams to ship faster. Reddit's mission to foster authentic communities resonates with me, and I am eager to contribute to its advertising technology.

Sincerely,

[Your Name]

Possible Interview Questions

How would you prioritize efficiency improvement projects across multiple model teams?

I would assess each opportunity based on potential impact on cost, latency, and scalability, while considering alignment with business goals. I'd start by identifying the biggest bottlenecks through profiling and collaboration with teams, then prioritize quick wins that build credibility and fast-follow with longer-term platform investments.

Describe a time you led a team to achieve a significant reduction in model training time. What was your approach?

At a previous role, we identified that data loading was a major bottleneck. I led the team to implement a distributed caching layer and optimized the data pipeline using parallel I/O. We also tuned hyperparameters and reduced model size through pruning. Training time decreased by 40%, and we established benchmarks to continuously monitor efficiency.

How do you balance the need for speed (fast iterations) with reliability and cost in ML systems?

I advocate for a culture of measurement and automation. By establishing clear metrics for latency, throughput, and cost, we can make data-driven tradeoffs. For instance, we can use canary deployments and automated rollback to ensure reliability while iterating quickly. Additionally, I invest in tooling that provides early warnings on cost or performance regressions.

How would you handle a situation where a model team resists optimization changes that could improve efficiency but might introduce risk?

I would first listen to their concerns and understand the risks from their perspective. Then, I would propose a phased approach with thorough testing, A/B experimentation, and gradual rollout. I'd also highlight the long-term benefits and offer to provide engineering support to minimize risk. Building trust through transparent communication and sharing success stories from other teams helps.

What experience do you have with GPU training and inference optimization at scale?

I have worked extensively with PyTorch and distributed training frameworks like DeepSpeed and Horovod. I optimized GPU memory usage by implementing gradient checkpointing and mixed precision training. For inference, I have used model quantization and batching strategies to reduce latency and improve throughput. I also led a migration from CPU to GPU serving, resulting in a 5x reduction in latency.

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com.

Reddit has a flexible workforce! If you happen to live close to one of our physical office locations our doors are open for you to come into the office as often as you’d like. Don’t live near one of our offices? No worries: You can apply to work remotely in any country in which we have a physical presence.

About the Role

Reddit is building a dedicated Ads ML Efficiency function to make model training and inference materially faster, cheaper, safer, and more scalable. As the Engineering Manager for this team, you will lead a group focused on model optimization, training efficiency, GPU enablement, load testing, model performance tooling, and efficiency guardrails across Ads ML.

This role sits at the intersection of ML modeling, systems optimization, and organizational leverage. You will partner closely with ranking teams, ML Platform teams and serving owners to identify the highest-value bottlenecks, land measurable efficiency wins, and build the tooling and operating mechanisms that make those wins repeatable.

What you’ll do:

Lead & Grow: Hire, mentor, and retain a high-performing team of ML engineers / systems-oriented engineers working on model optimization and ML efficiency.
Set Technical Direction: Define the roadmap for training optimization, inference optimization, launch-readiness tooling, and reusable efficiency primitives across Ads ML.
Deliver Measurable Wins: Drive reductions in model training time, online latency, serving cost, and infra-driven launch risk.
Build Systems and Tooling: Guide the development of profiling, benchmarking, load testing, observability, cost analysis, debugging, and efficiency certification systems.
Operate in the Critical Path: Partner with model owners and platform teams to accelerate high-priority launches and remove bottlenecks from the path to production.
Shape the Team’s Evolution: Balance near-term white-glove optimization work with medium-term platformization and automation.
Build XFN Alignment: Work closely with MLP, AMP, Ranking, and serving teams to clarify boundaries, upstream generic wins, and keep Ads needs on track.
Raise the Bar: Establish engineering rigor around measurement, performance debugging, launch safety, and technical decision-making for efficiency work.

What we’re looking for:

Deep ML Engineering Experience: The candidate should have been close to the models themselves and understand training, serving, debugging, and optimization in depth.
Hands-on Optimization Background: Direct experience improving training loops, serving systems, profiling workflows, model/inference efficiency, or GPU utilization.
Strong Managerial Ability: Experience building and leading teams, coaching engineers, managing delivery, and making prioritization tradeoffs under ambiguity.
Distributed Systems Fluency: Proven ability to reason about production-scale ML systems and the tradeoffs that govern reliability, speed, cost, and scale.
Customer and Platform Instincts: Able to work as a service provider to modeling teams while still building reusable systems rather than only heroic one-offs.
Strong Communication: Can explain technical tradeoffs clearly to engineers, PMs, and senior stakeholders.
Ads experience: Experience in ads ranking, recommender systems, marketplace ML, or adjacent production ML domains is strongly preferred.

Nice-to-have:

Experience with GPU training and serving migrations.
Experience with PyTorch, distributed training frameworks, or kernel/performance optimization.
Experience building efficiency benchmarking or launch certification frameworks.
Experience working in organizations where ML platform and applied modeling responsibilities are split across multiple teams.

Benefits:

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Pay Transparency:

This job posting may span more than one career level.

In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.

To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.

The base salary range for this position is:

$230,000—$322,000 USD

In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.

During the interview, we will collect the following categories of personal information: Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video recording), and any other categories of personal information you choose to share with us. We will use this information to evaluate your application for employment or an independent contractor role, as applicable. We will not sell your personal information or disclose it to any third party for their marketing purposes. We will delete any recording of your interview promptly after making a hiring decision. For more information about how we will handle your personal information, including our retention of it, please refer to our Candidate Privacy Policy for Potential Employees and Contractors.

Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.

Apply now >