Sre Manager - Apple Inc.
  • Dublin, Leinster, Ireland
  • via BeBee.com
-
Job Description

**Site Reliability Engineering Manager** **Summary:** Lead a team of Site Reliability Engineers to ensure the reliability and scalability of cloud services for millions of Apple users. Apple Services Engineering team is one of the most exciting examples of Apple's passion for combining art and technology. The Cloud Service Infrastructure team is responsible for building and supporting critical infrastructural systems and frameworks that provide services like structured and unstructured storage, caching, queueing, searching, and more at hyperscale. These systems form the platform upon which many i Cloud and other backend systems at Apple are built. **Key Responsibilities:** * Lead a platform-focused SRE team responsible for the reliability of the platform * Act as the Service Owner, designing and mapping key performance indicators to achieve the organization's mission * Implement structured engineering and operations processes * Lead the team in daily agile SRE practices, ensuring proper team focus on priorities, achievements, and deliverables * Optimize velocity and efficiency of delivery, and drive continuous improvement **Requirements:** * Experience in critical, large-scale distributed systems, combining hardware, operating systems, and software * Strong emphasis on SRE as an engineering subject area, with proficiency in at least one of the following languages: Golang, Rust, Python, Swift * Understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts * Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management, influencing decision making * Bachelor's or Master's in Computer Science, Computer Engineering, or equivalent experience **Preferred Qualifications:** * Working with large bare-metal infrastructure and release management * Experience with large-scale server provisioning, fleet management, and maintenance * Experience with development within the Kubernetes ecosystem, including operator framework, controllers, and CRDs * Automating operations processes via services and tools * Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others

;