Job Description
Software Engineering Manager II, Site Reliability Engineering
Job Summary: Lead a team of engineers to design, build, and maintain large-scale distributed systems, ensuring high availability, scalability, and performance.
Job Description:
As a Software Engineering Manager II, you will lead a team of engineers to design, build, and maintain large-scale distributed systems, ensuring high availability, scalability, and performance.
You will be responsible for owning end-to-end availability and performance of key services, building automation to prevent problem recurrence, and leading by example, mentoring the team, and establishing credibility through quality technical execution.
Key Responsibilities:- Lead a team of Software/Systems Engineers on projects for users and be directly responsible for uptime
- Own end-to-end availability and performance of key services and build automation to prevent problem recurrence
- Automate response to all non-exceptional service conditions
- Lead by example, mentor the team, and establish credibility through quality technical execution
- Manage on-call rotations across continents, using a follow-the-sun model
- Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of Google's services
Requirements:- Bachelor's degree in Computer Science, a related field, or equivalent practical experience
- 8 years of experience with data structures or algorithms
- 5 years of experience with software development in one or more programming languages
- 3 years of people management experience, and experience designing, analyzing, and troubleshooting distributed systems
- Experience working in computing, distributed systems, storage, or networking
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
- Ability to debug, optimize code, and to automate routine tasks
- Systematic problem-solving approach, coupled with effective communication skills
About the Company:
Google is a global company that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
Our Site Reliability Engineering (SRE) team ensures that Google's services have reliability, uptime appropriate to users' needs, and a fast rate of improvement.
We promote a culture of diversity, intellectual curiosity, problem solving, and openness, and encourage collaboration, thinking big, and taking risks in a blame-free environment.