We’ve launched our self-serve ads platform — use promo code HELLO10 and get a free $10 credit ›

Senior Site Reliability Engineer

Remote from
Poland flag
Poland
Annual salary
Undisclosed
Salary information is not provided for this position. Check our Salary Directory to estimate the average compensation for similar roles.
Employment type
Full Time,
Job posted
Apply before
11 Jun 2026
Experience level
Senior
Views / Applies
230 / 69

About Akamai Technologies

Akamai is the cybersecurity and cloud computing company that powers and protects business online.

Actively Hiring
Verified job posting
This job post has been manually reviewed for authenticity and compliance.

AI Summary

This Senior Site Reliability Engineer role at Akamai involves designing, developing, and managing applications and infrastructure for compute products. The position focuses on improving automation, efficiency, and reliability of large-scale distributed systems. Key responsibilities include developing automated tools, enhancing system monitoring, participating in on-call rotations, and contributing to capacity planning. The ideal candidate has expertise in Linux administration, Kubernetes, and programming languages like Python or Golang. Akamai offers a flexible working program called FlexBase, allowing employees to work from home, office, or both.

Job Complexity

Easy Hard
AI Insight The role requires expert-level experience in Linux/Unix administration, Kubernetes, and configuration management, along with skills in programming and observability tools, making it highly specialized and challenging.

Salary Analysis

Median
$180,000
US Market
$140,000 – $220,000
AI Insight The salary for this role is estimated based on market data. The median salary for a Senior Site Reliability Engineer in the US is around $180,000, which is competitive for the required expertise in Kubernetes, automation, and large-scale systems.

Key Skills

Site Reliability Engineering Kubernetes Linux Administration Python Terraform Prometheus Grafana Automation Incident Response Capacity Planning

Dear Hiring Manager,

I am writing to express my strong interest in the Senior Site Reliability Engineer position at Akamai. With over 8 years of experience in Linux administration, Kubernetes, and automation, I have a proven track record of improving system reliability and scalability at large-scale distributed environments. My expertise in Python, Terraform, and observability tools like Prometheus aligns perfectly with the requirements of this role.

In my previous role at a leading tech company, I led initiatives to automate deployment processes and reduce incident response time by 40%. I am passionate about building resilient systems and mentoring junior engineers to foster a culture of operational excellence. I am particularly drawn to Akamai's mission of powering and protecting life online and would be thrilled to contribute to your Compute products.

Thank you for considering my application. I look forward to the possibility of discussing how my skills and experience can benefit the Akamai team.

Sincerely,
[Your Name]

Describe your experience with Kubernetes in a large-scale production environment. How did you handle cluster scaling and resource optimization?
I have worked with Kubernetes clusters managing over 500 nodes, using horizontal pod autoscaling and cluster autoscaler to handle traffic spikes. I implemented resource quotas and limits to optimize utilization and reduce costs. For scaling, I used custom metrics from Prometheus to trigger autoscaling based on application latency.
How do you approach designing automation for incident response? Can you give an example of a tool you built?
I built a Python-based automation framework that integrated with PagerDuty and our monitoring stack. When an alert fired, it would automatically run diagnostic scripts, gather logs, and attempt predefined remediation actions like restarting services or scaling up instances. This reduced mean time to resolution by 30%.
How do you set and measure Service Level Objectives (SLOs) for a system?
I collaborate with product teams to define SLOs based on user expectations, such as 99.9% availability or latency under 200ms for 95% of requests. I use Prometheus and Grafana to measure error budgets and alert when burn rates exceed thresholds. This ensures we balance reliability with feature velocity.
Describe a challenging incident you resolved and how you improved the system afterward.
We had a cascading failure due to a misconfigured load balancer that caused a DDoS-like effect. I led the incident response, coordinating with network and application teams to isolate the issue. Post-mortem, I implemented rate limiting, automated failover tests, and added better monitoring to detect similar patterns early.
How do you mentor junior engineers in SRE practices?
I conduct regular knowledge-sharing sessions on topics like incident management and automation. I pair with junior engineers during on-call rotations to review their responses and provide feedback. I also encourage them to contribute to our runbooks and automation scripts, reviewing their code and suggesting improvements.

Are you passionate about cutting edge technology?

Do solving some of the Internet’s most difficult content delivery challenges interest you?

Join our highly skilled Site Reliability team

Our team designs, develops, and manages applications and infrastructure that support Akamai’s Compute products and services. We do this while maintaining Akamai’s mission at the forefront of what we do. Make life better for billions of people, billions of times a day.

Partner with the best

The Senior Engineer creates solutions to improve automation and efficiency for systems and teams. Responsibilities include optimizing workflows, infrastructure, and applications. Expertise in Linux administration, configuration management, and performance tuning is essential. Collaborate on deployment, monitoring, and resolving incidents. Focus on reliability, scalability, and efficiency through automation and resource optimization. Promote continuous improvement and operational excellence across all systems.

As a Senior Site Reliability Engineer, you will be:

  • Providing support and mentorship for other engineers within the department
  • Developing and maintaining automated tools and scripts to enhance system reliability, deployment processes, and incident response efficiency.
  • Improving our system monitoring to speed error detection and remediation, enhancing performance and reliability of virtualization platform
  • Participating in on-call rotations, guiding restoration and repair of service-impacting issues
  • Writing automation and tooling to reduce operational toil, improve deployment safety, and accelerate incident response
  • Contributing to capacity planning, autoscaling configuration, and workload scheduling for AI compute infrastructure

Do what you love

To be successful in this role you will:

  • Possess expert level experience in a SysAdmin (Linux/Unix Administration), DevOps or SRE role, working with large scale distributed systems
  • Demonstrate expertise in Kubernetes and large-scale containerization systems.
  • Possess at least one programming language (Python/Golang) and configuration management with Terraform/SaltStack/Ansible
  • Define SLOs and work with observability tools like Prometheus, Grafana, and distributed tracing to enhance system monitoring.
  • Have experience with architecting software and infrastructure at scale
  • Demonstrate accountability for reliability, develop automation and monitoring, and collaborate effectively with an engineering team unfamiliar with SRE practices.

Work in a way that works for you

FlexBase, Akamai’s Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.
Learn what makes Akamai a great place to work

Connect with us on social and see what life at Akamai is like!

We power and protect life online, by solving the toughest challenges, together.

At Akamai, we’re curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you’ll thrive here.

Working for you

At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life:

  • Your health
  • Your finances
  • Your family
  • Your time at work
  • Your time pursuing other endeavors

Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

About us

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world’s most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.

Join us

Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
#LI-Remote

Apply now >

Annual salary information is not provided for this position. Explore salary ranges for similar roles in our Salary Directory ›

This job listing has been manually reviewed by the Jobicy Trust & Safety Team for compliance with our posting guidelines, including verification of the company's legitimacy, accuracy of job details, clarity of remote work policy, and absence of misleading or fraudulent content.

How to apply

Did you apply? Let us know, and we’ll help you track your application.

See a few more

Similar DevOps & Infrastructure remote jobs

Job Search Safety Tips

Here are some tips to help you search and apply for jobs safely:
Watch out for suspicious jobs Don't apply for jobs that offer high pay for little work or offer to hire you without an interview. Read more ›
Check the employer's profile Make sure you're applying for a trustworthy job by visiting the employer's profile and learning more about them. Read more ›
Protect your information Don't share personal details like your bank account or government-issued ID on suspicious websites or messengers. Read more ›
Report jobs that feel unsafe If you see a job that seems misleading, inappropriate or discriminatory, report it for going against our policies and we'll review it.

Share this job

Jobicy+ Subscription

Jobicy

614 professionals pay to access exclusive and experimental features on Jobicy

Free

USD $0/month

For people just getting started

  • • Unlimited applies and searches
  • • Access on web and mobile apps
  • • Weekly job alerts
  • • Access to additional tools like Bookmarks, Applications, and more

Plus

USD $8/month

Everything in Free, and:

  • • Ad-free experience
  • • Daily job alerts
  • • Personal career consultant
  • • AI-powered job advice
  • • Featured & Pinned Resume
  • • Custom Resume URL
Go to account ›