Senior Manager, Infrastructure Reliability and AIOps Engineering

Remote from
Hungary flag
Hungary
Annual salary
Undisclosed
Salary information is not provided for this position. Check our Salary Directory to estimate the average compensation for similar roles.
Employment type
Full Time,
Job posted
Apply before
16 Mar 2026
Experience level
Senior
Views / Applies
47 / 15

About Genesys

The world's #1 customer experience platform.

Verified job posting
This job post has been manually reviewed for authenticity and compliance.

Genesys empowers organizations of all sizes to improve loyalty and business outcomes by creating the best experiences for their customers and employees. Through Genesys Cloud, the AI-powered Experience Orchestration platform, organizations can accelerate growth by delivering empathetic, personalized experiences at scale to drive customer loyalty, workforce engagement, efficiency and operational improvements.

We employ more than 6,000 people across the globe who embrace empathy and cultivate collaboration to succeed. And, while we offer great benefits and perks like larger tech companies, our employees have the independence to make a larger impact on the company and take ownership of their work. Join the team and create the future of customer experience together.

Role summary

The Sr. Manager, Infrastructure Reliability and AIOps Engineering is accountable for improving reliability, observability, and automated recovery across Cloud Infrastructure, Networking, Enterprise Tools, and IAM. This leader builds and operates the Reliability Engineering function using AIOps practices and is accountable for day-to-day operational outcomes, including incident response, escalations, and restoration quality. The role leads Reliability Analysts and partners with domain teams, ITSM/Platform Enablement, and Security to prevent incidents, reduce alert noise, and improve recovery performance.

Scope and accountability

This role is accountable for:
Operational ownership of event-driven incidents, including active participation in incident response, ticket escalation management, and coordination through resolution and restoration.

• AIOps outcomes and governance for platform operations: event ingestion, normalization, correlation, alert quality, intelligent routing, and automated event-to-incident workflows.
• Reliability outcomes across Cloud Infrastructure, Networking, Enterprise Tools, and IAM (SLO attainment, improved availability/latency where applicable, MTTD/MTTR reduction, reduced repeat incidents).
• Signal quality management (alert hygiene, deduplication, suppression, threshold tuning, enrichment, and ownership mapping) to improve signal-to-noise and reduce operational toil.
• Event correlation standards and service impact intelligence (dependency mapping, CI/service association, and prioritization logic aligned to CMDB/ITSM).
• Automation quality and “production readiness” for self-healing workflows across all platform domains (validation, rollback, auditability, and measurable success criteria).
• Reliability operating cadence (incident triage standards, major incident support model, post-incident reviews, problem trend management, and reliability roadmap governance).
• Reliability standards for telemetry, runbooks, monitoring coverage, and operational readiness checks (aligned to ITSM practices and security/compliance needs where applicable).

• Predictive avoidance driven IT Operations.

Key responsibilities

1) Reliability operations leadership

• Own the reliability execution model from signal → event → incident → restoration, including active incident engagement, escalation management, and accountability for ticket progression and resolution quality.
• Operate and continuously improve the AIOps layer: event ingestion/normalization, correlation rule design, enrichment, deduplication, suppression, and noise reduction.
• Drive measurable improvements in operational performance through alert quality KPIs (false positives, duplicates, unassigned events, time-to-triage).
• Lead post-incident reviews with a prevention mindset; convert lessons learned into problem records, reliability backlog items, and automation candidates with clear owners and due dates.
• Establish a consistent “incident learning → reliability backlog → automation delivery” feedback loop with Cloud, Network, Tools, and IAM teams.

This role is expected to balance strategic reliability improvements with hands-on operational leadership during incidents and escalations, especially while AIOps capabilities and automation maturity are being established.

2) SLOs, observability, and proactive detection

• Define reliability measurement across services and platforms: SLIs, SLOs, scorecards, and operational thresholds tied to customer impact and business priorities.
• Ensure telemetry standards are implemented across domains (metrics, logs, traces where applicable) to enable fast correlation, accurate impact analysis, and actionable alerts.
• Mature service health views and early warning signals by improving dependency awareness and context enrichment (service, CI, owner, criticality, user impact).
• Partner with domain SMEs to identify “leading indicators” and implement proactive detection and prevention patterns.

3) Automation and self-healing coverage

• Build and execute a reliability automation roadmap that reduces manual intervention, accelerates recovery, and improves operational consistency.
• Ensure reliability workflows are validated prior to release, with clear rollback, verification steps, and success metrics (automation success rate, time saved, MTTR impact).
• Lead development of event-triggered remediation and guardrails that safely automate recurring recovery actions, aligned with ITSM and change controls where required.
• Establish standards for runbooks and automated playbooks so recurring issues have a clear manual path and an automation path.

4) Domain reliability accountability

Cloud Infrastructure: Drive resilient operational patterns, standardized health signals, runbook maturity, and automated recovery paths for critical services.
Networking: Improve detection of degradation, accelerate isolation and restoration, and mature automated health validation, rollback, and recovery routines.
Enterprise Tools: Establish monitoring and reliability standards for enterprise platforms, including readiness checks, health indicators, and integration dependency monitoring.
IAM: Ensure identity lifecycle automation is reliable and observable (provisioning, deprovisioning, access changes); reduce failures through monitoring, guardrails, and automated recovery/alerting.

5) Cross-functional leadership and governance

• Partner with ITSM/Platform Enablement to strengthen event-to-incident flows, categorization, routing, major incident engagement, and service mapping alignment.
• Partner with Security and Compliance to ensure reliability supports control execution (monitoring coverage, evidence quality, exception handling, and remediation reliability).
• Establish reliability communications and governance cadence: weekly health review, monthly scorecard, quarterly roadmap outcomes, and prioritized reliability investment decisions.
• Align domain teams on reliability standards and adoption (telemetry, runbooks, alerting conventions, correlation requirements, and automation intake).

Required qualifications

• 8+ years in infrastructure operations, SRE, reliability engineering, or platform operations (or equivalent experience).
• 5+ years leading teams in an operations, reliability, or engineering environment.
• Proven track record of designing, architecting and building reliability through AIOps/event correlation, observability, automation, and incident learning.
• Experience building and operating alert/event management practices (signal quality, routing, enrichment, deduplication, suppression, and operational tuning).
• Working knowledge across cloud infrastructure concepts, enterprise networking fundamentals, enterprise tool operations, and IAM lifecycle concepts.
• Strong incident command and stakeholder communication skills, including executive-ready reporting and post-incident facilitation.

Preferred qualifications

• Experience implementing practical SLOs/SLIs, error budgets (where appropriate), and operational scorecards tied to business impact.
• Experience with AIOps platforms and event-to-ITSM integration patterns (event ingestion, correlation, automated ticketing, and routing).
• Scripting/automation leadership (PowerShell, Python, Ansible, Terraform, or similar) and experience operationalizing safe automation at scale.
• Familiarity with service mapping, CMDB dependency modeling, and operational governance practices.
• Experience establishing reliability standards across multiple infrastructure domains and driving adoption through governance and coaching.

#LI-MC1

If a Genesys employee referred you, please use the link they sent you to apply.

About Genesys:

Genesys® empowers more than 8,000 organizations worldwide to create the best customer and employee experiences. With agentic AI at its core, Genesys Cloud™ is the AI-Powered Experience Orchestration platform that connects people, systems, data and AI across the enterprise. As a result, organizations can drive customer loyalty, growth and retention while increasing operational efficiency and teamwork across human and AI workforces. To learn more, visit www.genesys.com.

Reasonable Accommodations:

If you require a reasonable accommodation to complete any part of the application process, or are limited in your ability to access or use this online application and need an alternative method for applying, you or someone you know may contact us at [email protected].

You can expect a response within 24–48 hours. To help us provide the best support, click the email link above to open a pre-filled message and complete the requested information before sending. If you have any questions, please include them in your email.

This email is intended to support job seekers requesting accommodations. Messages unrelated to accommodation—such as application follow-ups or resume submissions—may not receive a response.

Genesys is an equal opportunity employer committed to fairness in the workplace. We evaluate qualified applicants without regard to race, color, age, religion, sex, sexual orientation, gender identity or expression, marital status, domestic partner status, national origin, genetics, disability, military and veteran status, and other protected characteristics.

Please note that recruiters will never ask for sensitive personal or financial information during the application phase.

Apply now >

Annual salary information is not provided for this position. Explore salary ranges for similar roles in our Salary Directory ›

This job listing has been manually reviewed by the Jobicy Trust & Safety Team for compliance with our posting guidelines, including verification of the company's legitimacy, accuracy of job details, clarity of remote work policy, and absence of misleading or fraudulent content.

How to apply

Did you apply? Let us know, and we’ll help you track your application.

See a few more

Similar DevOps & Infrastructure remote jobs

Job Search Safety Tips

Here are some tips to help you search and apply for jobs safely:
Watch out for suspicious jobs Don't apply for jobs that offer high pay for little work or offer to hire you without an interview. Read more ›
Check the employer's profile Make sure you're applying for a trustworthy job by visiting the employer's profile and learning more about them. Read more ›
Protect your information Don't share personal details like your bank account or government-issued ID on suspicious websites or messengers. Read more ›
Report jobs that feel unsafe If you see a job that seems misleading, inappropriate or discriminatory, report it for going against our policies and we'll review it.

Share this job

Jobicy+ Subscription

Jobicy

588 professionals pay to access exclusive and experimental features on Jobicy

Free

USD $0/month

For people just getting started

  • • Unlimited applies and searches
  • • Access on web and mobile apps
  • • Weekly job alerts
  • • Access to additional tools like Bookmarks, Applications, and more

Plus

USD $8/month

Everything in Free, and:

  • • Ad-free experience
  • • Daily job alerts
  • • Personal career consultant
  • • AI-powered job advice
  • • Featured & Pinned Resume
  • • Custom Resume URL
Go to account ›