Research Engineer, SysML – FAIR

Remote from
USA
Annual salary
Undisclosed
Salary information is not provided for this position. Check our Salary Directory to estimate the average compensation for similar roles.
Employment type
Full Time,
Job posted
Apply before
3 Aug 2026
Views / Applies
153 / 6

About Meta

Giving people the power to build community and bring the world closer together

Actively Hiring
Verified job posting
This job post has been manually reviewed for authenticity and compliance.

AI Summary

Meta is seeking a Research Engineer for its Fundamental AI Research (FAIR) team to advance machine learning systems. The role focuses on systems challenges to accelerate progress toward human-level intelligence, including distributed training, hardware-software co-design, and scalable ML infrastructures. Candidates should have expertise in systems, computer architecture, compilers, or ML, with experience in Python, C++, PyTorch, and large-scale ML execution. The position offers opportunities to publish research and impact Meta's products. Ideal candidates have a PhD or equivalent experience and a proven track record of significant results.

Role DNA

Job Complexity
Easy Hard
Pace & Pressure
Relaxed Fast-paced
Autonomy Level
Guided Full Ownership
Communication Load
Independent Highly Collaborative
AI Insight This role requires a deep understanding of systems, ML, and AI, along with a PhD and proven research contributions. It targets cutting-edge advancements, making it one of the most challenging positions.

Salary Analysis

Median Market Rate
$250,000
US Market
$150k – 500k
0 $550k
AI Insight The salary for this role is not specified, but based on market data for senior research engineers in AI/ML at top tech companies, the median total compensation is approximately $250,000. This is competitive for the high level of expertise required.

Dear Hiring Manager,

I am excited to apply for the Research Engineer, SysML position at Meta FAIR. With a PhD in Computer Science and over 4 years of industry experience in systems and machine learning, I have developed scalable ML infrastructures and contributed to open-source projects like PyTorch. My research on distributed training and hardware-software co-design aligns perfectly with this role.

At my previous position, I led efforts to optimize training performance through advancements in cuBLAS and FlashAttention, resulting in a 30% speedup. I am passionate about open science and have published at top conferences such as MLSys and NeurIPS.

I am eager to bring my expertise in systems and AI to Meta FAIR and collaborate with world-class researchers to push the boundaries of artificial intelligence. Thank you for considering my application.

Sincerely,
[Your Name]

Can you describe a time when you optimized a large-scale machine learning training pipeline? What challenges did you face and how did you overcome them?
I worked on optimizing the training of a 175B parameter model. The main challenge was communication overhead in data parallelism. I implemented a gradient compression technique and used a hierarchical all-reduce algorithm, reducing communication time by 40% and overall training time by 20%.
How would you design a system to train a large language model efficiently across thousands of GPUs?
I would use a combination of data, model, and pipeline parallelism. For example, with 3D parallelism, I would shard the model across GPUs using tensor parallelism within nodes, pipeline parallelism across nodes, and data parallelism for scaling. I would also optimize the communication schedule and use mixed precision training to maximize throughput.
Describe your experience with hardware-software co-design in AI systems.
I have worked on designing custom kernels for new hardware accelerators. For instance, I collaborated with the hardware team to develop a fused attention kernel for a novel SRAM-based accelerator, achieving 5x speedup over off-the-shelf solutions. This required balancing algorithmic changes with hardware constraints.
How do you stay current with emerging AI technologies and apply them to your work?
I regularly read papers from top conferences and follow open-source projects. Recently, I integrated FlashAttention into our training pipeline, which improved throughput by 15%. I also experiment with new techniques like mixture of experts and adaptive computation to see if they benefit our models.
Describe a research project you led from conception to publication. What was your approach and what were the key results?
I led a project on memory-efficient training of transformers. I proposed a novel checkpointing scheme that reduced memory usage by 30% without sacrificing speed. We validated it on multiple benchmarks and published at NeurIPS. The work was adopted internally for training larger models.

Meta is seeking Research Engineers to join Fundamental AI Research (FAIR). We are committed to advancing the field of artificial intelligence by making fundamental advances in technologies to help interact with and understand our world. We are seeking individuals passionate in solving systems challenges to sustainably accelerate our reach to human-level intelligence. Candidates will have an opportunity to make fundamental advances in systems and apply their ideas at an unprecedented scale.The mission of Meta FAIR’s SysML research is to advance the state of AI through open science innovations. We explore, design, and build ML systems and infrastructures at scale with usability, efficiency, and sustainability as design principles. Some aspects of this role include enabling distributed training at an unprecedented scale through advancements and development in training library and authoring components, such as cuBLAS, cuDNN, FlashAttention, training performance acceleration through hardware-software co-design.ResponsibilitiesCarry out cutting-edge research to advance the science and technology of machine learning systems* Perform research that enables learning the semantics of data (images, video, text, audio, and other modalities)* Devise better data-driven models of AI system design and optimization* Contribute research that leads to innovations in: scalable machine learning systems, resource-efficient AI data and algorithm scaling and neural network architectures, memory and energy-efficient AI systems, environmentally-sustainable AI system and hardware designs* Collaborate with researchers and cross-functional partners including communicating research plans, progress, and results* Publish research results and contribute to research that impacts Meta product developmentQualificationsBachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience* Master’s degree in the field of Computer Science, Computer Engineering, or equivalent practical experience* 4+ years of domain-specific industry experience in areas related to development in systems, computer architectures, compiler and programming languages, machine learning, and artificial intelligence* Experience with Python, C++, C, Rust or other related languages and with PyTorch framework* Experience developing and optimizing systems for at-scale machine learning execution* Experience devising data-driven models and real-system experiments and design implementation for AI system optimization* Experience with scalable machine learning systems, resource-efficient AI data and algorithm scaling, or neural network architectures* Experience solving complex problems and comparing alternative solutions, tradeoffs, and different perspectives to determine a path forward* Experience working and communicating cross functionally in a team environment PhD in the field of Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience* Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies* Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)* Demonstrated research and software engineering experience via work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub)* Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as publications at leading workshops, journals or conferences such as MLSys, ISCA, ASPLOS, HPCA, PLDI, CGO, NeurIPS, ICML, ICLR, or similar* Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)

Apply now >

This job listing has been manually reviewed by the Jobicy Trust & Safety Team for compliance with our posting guidelines, including verification of the company's legitimacy, accuracy of job details, clarity of remote work policy, and absence of misleading or fraudulent content.

How to apply

Did you apply? Let us know, and we’ll help you track your application.

See a few more

Similar Software Engineering remote jobs

Jobs Talent Salaries
Menu