System Development Engineer, AWS Incident Tooling - ENGINEERINGUK
  • Dublin, Leinster, Ireland
  • via BeBee.com
-
Job Description

System Development Engineer, AWS Incident Tooling & Response

DESCRIPTION

Amazon Web Services is the largest consumer cloud offering in the world, powering cutting edge science, rapidly growing start-ups and industry-leading companies.

The AWS Incident Response Systems team is building systems to ensure these AWS customers can rely on the highest-availability, lowest-latency cloud platform on the planet. We work closely with the teams who own the largest AWS products, building systems to detect and mitigate operational issues before they impact customers. We are looking for a knowledgeable and experienced software development engineer to help us succeed in this mission.

As a System Development Engineer at AWS Incident Response Systems, you will join the team in the design and implementation of systems which automate fault containment, problem diagnosis, and issue resolution across multiple hugely-distributed, always-on architectures. These systems will take metric and dependency data from multiple sources and analyse them, correlating them with customer impact to determine root cause of an issue without human intervention. They will create engagements, facilitate communication and coordination of the response and mitigation. As the scale and complexity of AWS grows, this is the best way that we can offer our customers a stable and reliable cloud computing platform. We succeed once these systems detect, diagnose, and repair operational defects without customer impact or human intervention.

You will work with teams across AWS to drive adoption of the software that has been built by the team, and influence systems development practices for new and existing products. You will define availability goals for service teams across AWS, and strategies to make these goals attainable with minimal effort. Your goal will be to remove human-error from the day-to-day operations of the massive, always-on, distributed systems which make up AWS.

If this sounds like the right challenge for you, then please apply today

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain - and we're looking for talented people who want to help.

Key job responsibilities:

1. Write well-tested, maintainable code.
2. Design, contribute to, and maintain systems which solve customer problems.
3. Work with team-mates to improve code quality, system architecture, team processes and operations.
4. Learn about the incident management processes supported by the team's system to identify improvement opportunities.

BASIC QUALIFICATIONS

- Knowledge of systems engineering fundamentals (networking, storage, operating systems)
- Experience (non-internship) in professional software development
- Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems
- Experience in networking, storage systems, operating systems and hands-on systems engineering
- Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby

PREFERRED QUALIFICATIONS

- Experience with PowerShell (preferred), Python, Ruby, or Java
- Experience working in an Agile environment using the Scrum methodology

Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build.
#J-18808-Ljbffr

;