AI Engineer

Remote from: Ukraine
Annual salary: Undisclosed
Salary information is not provided for this position. Check our Salary Directory to estimate the average compensation for similar roles.
Department: Software Engineering
Employment type: Full Time,
Job posted: 22 May 2026
Apply before: 22 Jun 2026
Experience level: Senior
Views / Applies: 24 / 8

Crafting Consumer Products of Tomorrow

Actively Hiring

AI Summary

Ruby Labs is seeking a senior AI Engineer to lead the development of production-ready LLM experiences using Node.js, Next.js, and TypeScript. The role involves advanced prompt engineering, structured outputs, and complex LLM workflows with LangChain or LlamaIndex. You will own key AI features from experimentation to deployment, using Langfuse for observability and OpenRouter for model management. The ideal candidate has deep experience with dynamic prompting, AI evaluation, and a data-driven mindset. This position is remote within ±4 hours of CET.

Job Complexity

Easy Hard

AI Insight The role requires senior-level expertise in multiple advanced areas including prompt engineering, LLM observability, A/B testing, and fine-tuning, which are highly specialized and complex skills.

Salary Analysis

AI Insight The job posting does not specify a salary range. Based on market data for senior AI Engineer roles in the US, the typical range is $130,000 to $220,000 per year, with a median around $175,000. The offered salary is likely competitive for a senior-level position with specialized skills.

Key Skills

Node.js Next.js TypeScript LangChain LlamaIndex Langfuse OpenRouter Prompt Engineering LLM Evaluation AI Infrastructure

Cover Letter Sample

Dear Hiring Manager,

I am excited to apply for the AI Engineer position at Ruby Labs. With extensive experience in Node.js, Next.js, and TypeScript, I have a strong track record of building production-ready LLM systems. I have worked extensively with LangChain and LlamaIndex to design complex prompt workflows and structured outputs, ensuring high-quality and reliable AI responses.

My expertise includes implementing observability with Langfuse, performing deep debugging and evaluations to optimize cost and latency. I have also run systematic A/B tests across models via OpenRouter, making data-driven decisions to improve performance. I am eager to bring my skills in prompt engineering and AI infrastructure to Ruby Labs and contribute to innovative consumer products.

Thank you for considering my application. I look forward to the possibility of discussing how I can add value to your team.

Sincerely,
[Your Name]

Possible Interview Questions

Describe your experience with LangChain or LlamaIndex. Can you walk us through a complex LLM workflow you designed?

I have used LangChain extensively to build multi-step chains that involve retrieval, prompt templating, and output parsing. For example, I designed a customer support system that first retrieves relevant documentation, then generates a response using a dynamic prompt that includes context, and finally validates the output against a JSON schema to ensure structured data. I used LangChain's built-in tools for tracing and debugging to optimize performance.

How do you approach prompt engineering for complex tasks? Can you give an example of a dynamic prompt you created?

I approach prompt engineering by first defining the input variables and expected output structure. For a project that required generating personalized email summaries, I created a prompt that dynamically injected the user's name, recent activity, and preferences. The prompt included conditional logic to handle different user segments, and I used few-shot examples to guide the model. I iteratively tested and refined the prompt based on evaluation metrics.

Explain how you would use Langfuse to debug a slow or costly LLM chain. What metrics would you look at?

I would set up tracing in Langfuse to capture each step of the chain, including token usage, latency, and cost per call. I would look at the trace waterfall to identify bottlenecks, such as a retrieval step taking too long or a prompt generating many tokens. I would also analyze the cost breakdown to see if a cheaper model could be used for certain steps. Based on the data, I might optimize the prompt to reduce tokens or switch to a faster model.

Describe your experience with AI A/B testing. How do you determine which model or prompt is better?

I have run A/B tests comparing different models (e.g., GPT-4 vs. Claude) and prompt variants using OpenRouter. I define quantitative metrics such as response accuracy, latency, cost per request, and user feedback scores. I set up experiments with proper statistical significance, and use Langfuse to collect trace data for each variant. I analyze the results using dashboards and make deployment decisions based on the metrics that align with business goals.

How do you ensure the quality and reliability of AI outputs in production?

I implement a multi-layered approach: first, I design robust prompts with structured output schemas to reduce hallucinations and errors. Second, I use Langfuse for real-time monitoring and scoring, setting up alerts for anomalies. Third, I run automated evaluations using test datasets and custom scoring systems to catch regressions. Finally, I have a feedback loop where user interactions are logged and periodically reviewed to fine-tune models or prompts.

About us

Ruby Labs is a leading tech company that creates and operates innovative consumer products. We offer a diverse range of opportunities across the health, education, and entertainment industries. Our innovative teams are driving the future of consumer-led products, and we’re always looking for passionate individuals to join us. Learn more about our story at: https://rubylabs.com/about-us/

About the role

At RubyLabs, we’re seeking a senior AI Engineer (Node.js / Next.js / TypeScript) to shape our AI infrastructure and drive production-ready LLM experiences. You’ll work in a modern stack, making data-driven decisions around model performance, reliability, and cost.

You’ll own advanced prompt systems, structured outputs, and complex LLM workflows using LangChain or LlamaIndex. Observability, debugging, and evaluation are core to the role, leveraging Langfuse and AI gateways like OpenRouter to continuously improve model quality and operational efficiency. You’ll take full ownership of key AI features from experimentation to live production.

Key Responsibilities

Advanced Prompt Engineering: Designing complex, dynamic prompt templates with conditional logic and efficiently reusing information and context within prompts to maximize generation quality and reasoning.
Structured Outputs & Schemas: Implementing various response schemes (JSON mode, function calling, Zod/JSON schemas) to ensure AI outputs are predictable and ready for seamless integration into application logic.
Prompt Engineering & Evaluations: Building robust evaluation pipelines and using Langfuse to collect feedback and score the quality of responses in real time.
Tracing & Debugging: Performing deep debugging of complex LLM chains using Langfuse traces to identify bottlenecks and optimize for cost, latency, and context window usage.
AI A/B Testing: Running systematic experiments across different models via OpenRouter (e.g., comparing Claude 3.5 Sonnet vs. GPT-4o) and analyzing results based on quantitative metrics.
Data-Driven Decisions: Making deployment decisions for new prompts or models strictly based on quantitative benchmarks and trace data, rather than intuition.
Output Scoring & Analysis: Developing scoring systems to analyze the “Problem → Solution” chain and identify root causes of hallucinations or logic errors using Langfuse analytics.
Model Performance & Fine-Tuning: Regularly re-evaluating model performance as new architectures emerge and performing fine-tuning when necessary to meet specific domain requirements.

Qualifications

Node.js & Next.js: Deep knowledge of the stack to build reliable services and handle complex LLM-generated data.
Dynamic Prompting Skills: Proven experience in building prompts where content is highly dependent on input variables and context injection.
OpenRouter Experience: Experience working with unified APIs, managing rate limits, and selecting the most cost-effective models for specific tasks.
Langfuse (or similar): Understanding of LLM observability principles — setting up tracing, creating test datasets, and integrating scoring systems.
Evaluation Methodology: Experience with frameworks like RAGAS or building custom “LLM-as-a-judge” systems.
Analytical Mindset: Ability to transform raw generation logs into actionable business metrics and technical insights.
Iterative Mindset: Focus on continuous product improvement through constant feedback loops.

Nice to have

Fine-Tuning: Practical experience in fine-tuning models for specific domain tasks or JSON compliance.
RAG Architecture: Understanding how to build and optimize Retrieval-Augmented Generation systems, including indexing, retrieval, and re-ranking.
Python: Basic knowledge for working with data science scripts or AI evaluation libraries.

Location

Ruby Labs operates within the CET (Central European Time) zone. Applicants from any country are welcome to apply for the position as long as they are located within approximately ± 4 hours of CET. This ensures optimal collaboration and communication during working hours.

Benefits

Discover the perks of being part of our vibrant team! We offer:

Remote Work Environment: Embrace the freedom to work from anywhere, anytime, promoting a healthy work-life balance.
Unlimited PTO: Enjoy unlimited paid time off to recharge and prioritize your well-being, without counting days.
Paid National Holidays: Celebrate and relax on national holidays with paid time off to unwind and recharge.
Company-provided MacBook: Experience seamless productivity with top-notch Apple MacBooks provided to all employees who need them.
Flexible Independent Contractor Agreement: Unlock the benefits of flexibility, autonomy, and entrepreneurial opportunities. Benefit from tax advantages, networking opportunities, reduced employment obligations, and the freedom to work from anywhere. Read more about it here: https://docs.google.com/document/d/1nkrN76JlZkbKj9WSOhlT1_mni_CZeDkHdwfIjPXVwvk/preview?tab=t.0#heading=h.ndsdl4wapxtt

Be part of our fast-growing team and seize this excellent opportunity for personal and professional growth!

Interview Process

After submitting your application, we conduct a thorough review which typically takes 3 to 5 days, but may occasionally take longer due to the volume of applications received. If we see a potential fit, we proceed with the following steps:

Recruiter Screening (40 minutes)
Technical Interview (60 minutes)
Final Interview (30 minutes)

Life at Ruby Labs

At Ruby Labs, we move fast, aim high, and expect the same from our team. We’re not here to play small—we’re here to build, grow, and win. That means we look for people who are ambitious, driven, and ready to give their best every single day.

This is a place for individuals who thrive under pressure, embrace challenges, and see opportunity in every obstacle. If you’re hungry to achieve, motivated by impact, and want to grow at the speed of your own ambition, Ruby Labs offers the platform to make it happen.

Here, effort is matched with reward. We recognize those who go all in and deliver results, and we create space for people who want more—more responsibility, more growth, and more success.

#LI-Remote

Apply now >