Staff Software Engineer, Site Reliability Engineering
Google's Site Reliability Engineering (SRE) team combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
As a Staff Software Engineer, you will be responsible for designing, analyzing, and troubleshooting large-scale distributed systems, ensuring reliability, uptime, and performance.
Key Responsibilities:
- Engage in the whole lifecycle of services, from inception and design to deployment, operation, and refinement.
- Support services before they go live through system design consulting, software development, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
Requirements:
- Bachelor's degree in Computer Science or a related field, or equivalent practical experience.
- 5 years of experience with software development in one or more programming languages.
- 8 years of experience with data structures or algorithms.
- 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems.
Preferred Qualifications:
- Experience working in computing, distributed systems, storage, or networking.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code, and automate routine tasks.
- Systematic problem-solving approach, coupled with effective verbal and written communication skills.