(Senior) Site Reliability Engineer (m/f/d) – Platform & Agentic Operations

Remote from
Germany flag
Germany
Annual salary
Undisclosed
Salary information is not provided for this position. Check our Salary Directory to estimate the average compensation for similar roles.
Employment type
Full Time,
Job posted
Apply before
2 Jul 2026
Experience level
Senior
Views / Applies
32 / 5

About 1KOMMA5°

Accelerate CO2 neutral life for all!

Verified job posting
This job post has been manually reviewed for authenticity and compliance.

AI Summary

1KOMMA5° is seeking a Senior Site Reliability Engineer to join their Platform & Agentic Operations team, focusing on building Europe's largest virtual power plant. The role combines classic SRE responsibilities with agentic engineering, leveraging AI agents to automate CI/CD pipelines and resolve deployment bottlenecks. Key technologies include GCP, Terraform, Kubernetes, and observability tools like Datadog. Candidates need 6+ years of SRE experience, proficiency in Python/Go/TypeScript, and knowledge of LLM integration. This is a unique opportunity to impact the energy transition while working with cutting-edge automation.

Role DNA

Job Complexity
Easy Hard
Pace & Pressure
Relaxed Fast-paced
Autonomy Level
Guided Full Ownership
Communication Load
Independent Highly Collaborative
AI Insight The role requires deep technical expertise in SRE, cloud infrastructure, and AI agent integration, plus 6+ years experience, making it challenging. However, it's not entry-level and demands specialized skills.

Salary Analysis

Median Highly Competitive
$175,000
US Market
$130k – $220k
0 $242k
AI Insight Salary not provided; estimated median for Senior SRE in US market is $175,000. The role is senior-level with specialized skills in AI agents, which may command higher compensation. Benefits and remote flexibility may offset salary.

Key Skills

Site Reliability Engineering Google Cloud Platform Terraform Kubernetes CI/CD Python Go Observability AI Agents Incident Management

Dear Hiring Team,

I am excited to apply for the Senior Site Reliability Engineer position at 1KOMMA5°. With over 6 years of experience in SRE and platform engineering, I have a strong track record of building resilient infrastructure and automating workflows. I am particularly drawn to your focus on Agentic Operations, where I can leverage my expertise in LLMs and CI/CD optimization to reduce developer friction.

In my previous role, I implemented monitoring systems that improved incident response times by 40% and designed automated pipelines using Terraform and GitHub Actions. I am proficient in Python, Go, and TypeScript, and have hands-on experience with GCP and observability tools like Datadog. I am passionate about using technology to drive the energy transition and would love to contribute to your mission.

Thank you for considering my application. I look forward to discussing how my skills align with your team's goals.

Sincerely,
[Your Name]

Describe a time you used AI agents to automate a CI/CD pipeline. What challenges did you face?
At my previous company, I integrated an LLM to analyze test failures in our CI pipeline. The challenge was ensuring the agent had accurate context from logs and dependencies. I implemented a system that fed real-time failure data into the agent, which then suggested fixes. This reduced manual debugging time by 30%.
How do you approach setting and tracking SLOs for a complex system?
I start by identifying critical user journeys and defining SLIs that reflect user experience. For example, for a virtual power plant, I'd track latency and error rates of API calls. I use tools like Datadog to monitor these SLIs and set SLO targets based on business impact. Regular error budgets help balance reliability and feature velocity.
Explain your experience with GCP services like CloudRun and GKE. How do you ensure cost efficiency?
I have designed microservices on CloudRun and managed clusters on GKE. For cost efficiency, I use right-sizing by analyzing resource utilization, implement auto-scaling, and leverage preemptible VMs for batch jobs. I also use Terraform to enforce tagging and budget alerts.
Describe a post-incident review you led. What improvements resulted?
I led a review after a database outage caused by a misconfigured failover. We identified that our testing didn't cover failover scenarios. We implemented chaos engineering experiments and automated failover tests. This reduced recovery time by 50% and prevented similar incidents.
How would you integrate an LLM agent into incident response?
I would create a pipeline where alerts from Datadog trigger an agent that gathers relevant logs, traces, and recent changes. The agent would then generate a diagnosis and suggest remediation steps. For example, if a deployment caused errors, the agent could rollback automatically after human approval.

1KOMMA5°

At 1KOMMA5°, we pursue a clear vision: Living on wind and sunlight forever for free. To make this a reality, we are building the energy system of the future with Heartbeat AI. Want to be part of it?We bring together regional craftsmanship and scalable software: We don’t think of solar, batteries, heat pumps, and e-mobility as isolated components, but control them as an intelligent, integrated overall system in our virtual power plant. Directly connected to the electricity market – in real time, fully automated. This way, energy is used when it is available from renewables and particularly cost-effective. By 2030, our goal is to transition 1.5 million households to renewable energies. Over 3,000 people are working towards this every day, at more than 80 locations worldwide, from Finland to Australia.
Want to take responsibility and build solutions that truly matter? Apply now and help us shape the energy world of tomorrow.
Learn more about our Product & Tech team!

Deine Position

1KOMMA5° is building Europe’s largest virtual power plant (“Heartbeat AI”). As a Senior SRE in our Platform team, you will bridge classic infrastructure with Agentic Engineering, specifically focusing on leveraging AI agents to eliminate developer friction, optimize CI/CD pipelines, and automate the resolution of code review and deployment bottlenecks.

Tech Stack

  • Cloud & Infra: GCP (CloudRun, GKE), Terraform, Terramate

  • Reliability: Incident.io, Datadog (OpenTelementry)

  • Agentic: Cursor

  • CI/CD & DevEx: GitHub Actions, Backstage

  • Languages: Python, GoLang, TypeScript

Key Responsibilities Include but not limited to

  • Implement and improve monitoring, alerting, and incident response systems and processes to ensure high reliability for our customers and meet defined SLOs

  • Design, build, and maintain resilient, scalable infrastructure utilizing SRE principles and best practices

  • Attend post-incident reviews, detect patterns and contribute to continuous improvement efforts

  • Execute performance testing, analyze system bottlenecks, and formulate strategies for capacity planning to ensure our systems meet current and future demands effectively

  • Build systems where CI/CD test failures serve as immediate, real-time context for agents, enabling them to analyze logs, trace dependencies, and suggest or apply instant code fixes.

Dein Profil

  • 6+ years in SRE, DevOps, or Platform Engineering

  • Strong understanding and practical application of Site Reliability Engineering (SRE) principles, methodologies, and best practices

  • Proficiency in programming/scripting languages such as Python, GoLang or TypeScript

  • Practical understanding of integrating LLMs into automated workflows. You know how to feed live system state (like a fresh CI test failure) into an agent as actionable context.

  • Prior experience in incident management, post-incident reviews, and implementing improvements to prevent future incidents

  • Ability to troubleshoot complex technical issues systematically and effectively

  • Good experience working with a public cloud provider, ideally Google Cloud Platform (GCP), and a solid understanding of its observability services

  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

  • Excellent communication skills to convey technical concepts and collaborate effectively with diverse teams

  • Very good knowledge of spoken and written english, german is a plus

  • Residency in Germany

Bonus points for:

  • Interest in climate tech industry

  • Prior experience with IoT applications

  • Having worked in a scale up environment at a company of similar size

Benefits

  • You are part of an international, dynamic, and highly motivated team of people who have proven to make things happen
  • With your work, you accelerate the “energy transition” and hence have a direct impact on our climate
  • Work with and learn from other super-smart colleagues
  • You will enjoy direct contact with core decision-makers
  • You will enjoy the best chances of entering full-time in one of Europe’s most thriving scaleups
  • You work remotely (Germany-wide), with offices in Hamburg, Berlin or Munich
  • Create a healthy balance alongside your work and enjoy all the benefits of the EGYM Wellpass
  • Benefits and discounts are yours with Futurebens
  • Whether city bike or e-bike – be flexible with our job bike leasing and do something good for the environment at the same time

Apply now >

Annual salary information is not provided for this position. Explore salary ranges for similar roles in our Salary Directory ›

This job listing has been manually reviewed by the Jobicy Trust & Safety Team for compliance with our posting guidelines, including verification of the company's legitimacy, accuracy of job details, clarity of remote work policy, and absence of misleading or fraudulent content.

How to apply

Did you apply? Let us know, and we’ll help you track your application.

See a few more

Similar DevOps & Infrastructure remote jobs

Job Search Safety Tips

Here are some tips to help you search and apply for jobs safely:
Watch out for suspicious jobs Don't apply for jobs that offer high pay for little work or offer to hire you without an interview. Read more ›
Check the employer's profile Make sure you're applying for a trustworthy job by visiting the employer's profile and learning more about them. Read more ›
Protect your information Don't share personal details like your bank account or government-issued ID on suspicious websites or messengers. Read more ›
Report jobs that feel unsafe If you see a job that seems misleading, inappropriate or discriminatory, report it for going against our policies and we'll review it.

Share this job

Jobicy+ Subscription

Jobicy

614 professionals pay to access exclusive and experimental features on Jobicy

Free

USD $0/month

For people just getting started

  • • Unlimited applies and searches
  • • Access on web and mobile apps
  • • Weekly job alerts
  • • Access to additional tools like Bookmarks, Applications, and more

Plus

USD $8/month

Everything in Free, and:

  • • Ad-free experience
  • • Daily job alerts
  • • Personal career consultant
  • • AI-powered job advice
Go to account ›