Remote Production Support Engineer @ Amount

The Production Support Engineering team plays a key role at Amount by ensuring production issues are managed efficiently and effectively. You will manage high-priority issues to resolution following industry best practices. You’ll troubleshoot, fix, and apply workarounds to resolve technical issues across multiple platforms. Each day, you’ll interact with every aspect of our organization to find the best solution for our partner. Management of ticket queues, monitoring for issues and post-release validation are also a large part of this role, all while meeting our partner’s SLA requirements. Team: This role interacts with nearly every group within the organization, including engineering, product, QA, customer success and others. Salary: $63,000-$73,000 base salary Bonus & Equity: Amount employees are eligible for annual performance bonuses and equity grants as part of our commitment to shared success! Similar job titles: Production Support, Production Support Analyst, Incident Manager, Incident Coordinator, IT Major Incident Manager, Application Support Engineer, Support Engineer

WHAT WE’LL TRUST YOU TO DELIVER

Technical ability to deep dive into issues by querying tables, analyzing data and problem-solving
Prioritization and triage of incoming requests/issues
Drive incident resolution and lead conversations with cross-functional groups. Ask the right questions to help determine impact/priority and the correct route for resolution. Oversee a technical bridge, if required.
Management of all incidents through the incident management lifecycle
Documentation of all relevant events, getting status reports while driving decision-making and resolution
Ensure stakeholders are updated according to predefined service level agreements
Completion and ownership of the postmortem with appropriate root cause analysis performed
Improvement suggestions to capture preventative measures that will avoid recurrences of incidents
Investigate patterns that indicate larger overall issues, even if we don’t have the solution.
Compilation of metrics on a weekly and monthly basis. Maintain dashboards for service incidents and ad hoc reporting as requested
Play an active role during critical incidents which may occur outside of normal business hours. Nights, weekends, and holidays on an on-call rotation basis is a must
Creation of runbooks or standard operating procedures (SOP) so we can all learn from each other and add to our knowledge base

WHAT YOU LIKELY BRING TO THE TABLE

Technical and/or engineering background, ideally with experience writing SQL queries
Experience working with development teams in a fast-paced environment
Basic knowledge or interest of any programming language such as Java, Python or Ruby
2 years of experience coordinating and executing major incidents, with demonstrated capacity to lead under pressure
Previously collaborated with a wide spectrum of internal and external stakeholders
Worked in an organization with a complex business environment
Leadership skills with the ability to make quick decisions
Familiar with ITSM/ITIL concepts
You thrive being a self-starter, who can lead others during stressful situations
Familiar with tools such as Confluence, Jira, and on-call management software such as PagerDuty and experience with error monitoring software (Sentry, Kibana)

ABOUT AMOUNT (TL;DR)

Founded: 2020 Employees: 150+ Locations: Chicago (HQ) and US Remote Funding: Amount has raised $281M in total equity capital since inception, including most recently at a valuation of $1B. Investors include WestCap, Hanaco Ventures, Goldman Sachs, Invus Opportunities, Mastercard, and PSCU