We are looking for a Senior Site Reliability Engineer with strong experience in AWS, system monitoring, and infrastructure automation. The role involves maintaining and improving the reliability and performance of a cloud-based lending platform used by mid-market and large financial institutions. The ideal candidate will have a solid background in systems engineering and software development, be comfortable working across teams, and take ownership of operational stability and tooling improvements.
Responsibilities:
- Own your deep learning about the software, its functions, and how it fulfills the clientsâ needs, and how they use the product.
- Oversee systems to ensure reliability for customers.
- Monitor distribution systems and notify appropriate persons of any potential issues.
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
- Partner with development teams to improve services through rigorous testing and release procedures.
Technical Skills:
- Bachelor’s Degree (B.A.) in Computer Science or Design or equivalent four-year degree, or equivalent related experience.
- 5-7 years of proven experience in a Site Reliability role or similar experience.
- Excellent oral and written communication skills, including facilitation of group presentations, and consulting skills in the English language.
- Possess deep technical experience with AWS, containerization technologies, automated deployment frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture.
- Demonstrate hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges.
- Experience working with Infrastructure and Application Monitoring tools such as: New Relic, SumoLogic, Uptime monitoring (Pingdom), CloudTrail, CloudWatch Insights, CloudFormation, CodePipeline, CodeDeploy.
- Extensive working knowledge of managing AWS and Linux OS.
- Experience working with MSSQL, MySQL, in cloud-based environments, as well as demonstrable knowledge and experience of AWS service technologies, i.e., Aurora, MySQL.
- Experience of working with NoSQL database technologies (ideally DynamoDB).
- Experience of working with pipeline automation scripting and tooling, i.e., Jenkins, Terraform.
- Knowledge and experience utilizing coding languages (e.g., C++, Java, PHP) and frameworks/systems (e.g., AWS).
- Ability to learn new languages and technologies strongly preferred.
- Broad understanding of the lending industry, with the ability to become a subject matter expert on the job.
Soft Skills:
- A strong sense of ownership.
- Excellent written and verbal communication and interpersonal skills.
- Able to effectively collaborate with technical and business partners.
- Can take on full projects from beginning to end.
- Problem solver.
- Team Player.
- Advanced English level.