Senior Machine Learning Systems Engineer, Ads ML Experience Platform

Remote from
USA
Salary, yearly, USD
216,700 - 303,400
Employment type
Full Time,
Job posted
Apply before
23 Jul 2026
Experience level
Senior
Views / Applies
121 / 27

About Reddit

Dive into anything

Actively Hiring
Verified job posting
This job post has been manually reviewed for authenticity and compliance.

AI Summary

Reddit is seeking a Senior Machine Learning Systems Engineer to join the Ads ML Experience Platform team, focusing on building the next generation of ML research tools and agentic AI platforms. The role involves designing large-scale offline ML experimentation platforms, production training orchestration frameworks, and infrastructure for experiment tracking and model registries. The ideal candidate has 5+ years in infrastructure engineering, experience with distributed systems, and familiarity with workflow orchestration technologies like Kubeflow, Argo, or Airflow. Knowledge of agentic AI systems is a strong plus. This is a senior-level position offering a competitive salary and equity package.

Role DNA

Job Complexity
Easy Hard
Pace & Pressure
Relaxed Fast-paced
Autonomy Level
Guided Full Ownership
Communication Load
Independent Highly Collaborative
AI Insight The role requires deep expertise in large-scale distributed systems, ML platforms, and agentic architectures, as well as 5+ years of experience. The breadth and complexity of the responsibilities make it a challenging position, but not the highest difficulty level because it is still within the realm of experienced senior engineers.

Salary Analysis

Median Highly Competitive
USD260,050
US Market
USD180k – 320k
0 USD352k
AI Insight The offered salary range of $216,700 to $303,400 per year is highly competitive and above the typical market range for senior ML systems engineers, which usually falls between $180,000 and $320,000. This reflects Reddit's commitment to attracting top talent and the high value placed on this role.

Key Skills

Machine Learning Systems Distributed Systems Platform Engineering ML Infrastructure Kubernetes Spark Kubeflow Agentic AI Orchestration Model Registry

Dear Hiring Manager,

I am writing to express my strong interest in the Senior Machine Learning Systems Engineer position at Reddit. With over 5 years of experience in infrastructure engineering and a deep passion for building scalable ML platforms, I am excited about the opportunity to contribute to Reddit's Ads ML Experience Platform. My expertise in distributed systems, workflow orchestration, and agentic AI aligns well with the requirements of this role.

In my previous role, I designed and built large-scale offline ML experimentation systems and production training frameworks, significantly improving model iteration velocity. I have hands-on experience with technologies such as Spark, Kubeflow, and Kubernetes, and I am eager to bring my skills to Reddit to help accelerate the ML lifecycle.

Thank you for considering my application. I look forward to the possibility of discussing how I can contribute to your team.

Sincerely,
[Your Name]

Describe your experience building large-scale offline ML experimentation platforms. What challenges did you face and how did you overcome them?
I built an offline experimentation platform that handled thousands of experiments daily using a microservices architecture. Key challenges included reproducibility and data consistency. We implemented experiment tracking with metadata versioning and used containers to ensure consistent environments. For scalability, we used Kubernetes to dynamically allocate resources.
How would you design a workflow orchestration system for distributed training with hyperparameter optimization?
I would use a framework like Kubeflow or Argo to define pipelines. For hyperparameter optimization, I'd integrate a service like Optuna or Ray Tune. The system would manage task scheduling, resource allocation, and fault tolerance. I'd also implement automated retraining triggers based on model performance metrics.
Explain your experience with agentic AI systems. How would you build an agentic execution platform for ML workflows?
I've worked on multi-agent systems where agents collaborate via message passing. For an ML platform, I'd design agents that handle data preprocessing, model training, evaluation, and deployment. The platform would include a runtime with memory and context management, and support human-in-the-loop for critical decisions.
How do you ensure reproducibility in ML experiments at scale?
Reproducibility requires tracking code, data, environment, and parameters. I use tools like DVC or MLflow to version data and models, and containers to lock dependencies. Each experiment captures the exact commit hash, dataset version, and hyperparameters. For large-scale, we store artifacts in a model registry with lineage tracking.
Describe a time you improved operational efficiency for ML engineering teams through platform tooling.
In a previous role, I built a self-service SDK that automated model deployment and A/B testing. This reduced the time from experiment to production from weeks to days. I also implemented an automated monitoring system that detected data drift and triggered retraining, saving significant manual effort.
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com.

Reddit has a flexible workforce! If you happen to live close to one of our physical office locations our doors are open for you to come into the office as often as you’d like. Don’t live near one of our offices? No worries: You can apply to work remotely in any country in which we have a physical presence

Team Overview

We are building the next generation of ML research tools and agentic AI platforms that power machine learning development across Reddit. Our mission is to accelerate the Ads ML lifecycle – from experimentation and training to deployment, evaluation, and autonomous operations – through scalable platform services, intelligent automation, and developer-centric tooling.

Our team owns critical platform capabilities including offline ML experimentation systems, production training orchestration frameworks, ML lifecycle automation and, agentic ML frameworks that enable faster model iterations.

We are looking for an experienced engineer with deep expertise in large-scale distributed systems, ML platforms, and emerging agentic architectures to help define and build the foundational tooling for the next generation of our machine learning devX tooling.

What You’ll Do

  • Design and build large-scale offline ML experimentation platforms that enable reproducible research, model development, evaluation, and promotion workflows.
  • Develop production-grade training orchestration frameworks supporting distributed training, hyperparameter optimization, model evaluation, and automated retraining.
  • Build infrastructure for experiment tracking, metadata management, lineage, artifact versioning, model registries, and reproducibility.
  • Partner with ML engineers and researchers to improve experimentation velocity and operational efficiency.
  • Build automated workflows for model promotion, rollback, compliance validation, and continuous evaluation.
  • Design and build an agentic AI execution platform supporting autonomous and human-in-the-loop workflows, including multi-agent orchestration, memory/context systems, and scalable workflow infrastructure.

What You Bring

  • 5+ years in infrastructure/platform engineering or large-scale distributed systems.
  • 2+ years of hands-on experience building and operating production ML infrastructure, developer SDKs, platform APIs, or self-service AI tooling.
  • Experience building workflow orchestration systems, developer platforms, or large-scale automation frameworks.
  • Experience with distributed data processing systems such as Spark, Flink, Ray, or equivalent technologies.
  • Experience with modern orchestration and workflow technologies such as Kubeflow, Argo, Airflow, or similar frameworks.
  • Experience building offline ML experimentation platforms, model registries, experiment tracking systems, or training orchestration frameworks.
  • Experience building and operating agentic AI systems, including multi-agent orchestration, autonomous workflows, and agent communication/runtime frameworks (e.g., MCP, A2A, and orchestration systems) is a strong plus
  • Experience running end-to-end model development and iteration cycles at scale is a plus

Pay Transparency:

This job posting may span more than one career level.

In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.

To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.

The base salary range for this position is:
$216,700—$303,400 USD

In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.

During the interview, we will collect the following categories of personal information: Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video recording), and any other categories of personal information you choose to share with us. We will use this information to evaluate your application for employment or an independent contractor role, as applicable. We will not sell your personal information or disclose it to any third party for their marketing purposes. We will delete any recording of your interview promptly after making a hiring decision. For more information about how we will handle your personal information, including our retention of it, please refer to our Candidate Privacy Policy for Potential Employees and Contractors.

Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.

Apply now >

This job listing has been manually reviewed by the Jobicy Trust & Safety Team for compliance with our posting guidelines, including verification of the company's legitimacy, accuracy of job details, clarity of remote work policy, and absence of misleading or fraudulent content.

How to apply

Did you apply? Let us know, and we’ll help you track your application.

See a few more

Similar Software Engineering remote jobs

Job Search Safety Tips

Here are some tips to help you search and apply for jobs safely:
Watch out for suspicious jobs Don't apply for jobs that offer high pay for little work or offer to hire you without an interview. Read more ›
Check the employer's profile Make sure you're applying for a trustworthy job by visiting the employer's profile and learning more about them. Read more ›
Protect your information Don't share personal details like your bank account or government-issued ID on suspicious websites or messengers. Read more ›
Report jobs that feel unsafe If you see a job that seems misleading, inappropriate or discriminatory, report it for going against our policies and we'll review it.

Share this job

Jobicy+ Subscription

Jobicy

617 professionals pay to access exclusive and experimental features on Jobicy

Free

USD $0/month

For people just getting started

  • • Unlimited applies and searches
  • • Access on web and mobile apps
  • • Weekly job alerts and digest
  • • Access to additional tools like Bookmarks, Applications, and more

Plus

USD $8/month

Everything in Free, and:

  • • Ad-free experience
  • • Daily job alerts and digest
  • • Personal career consultant
  • • AI-powered job advice
Go to account ›