Sr System Reliability Engineer - Fulcrum Digital
  • Dublin, Leinster, Ireland
  • via BeBee.com
-
Job Description

Job Summary: We are seeking a Production Environment Manager to oversee all aspects of a Production Environment, define strategies for Application Performance Monitoring, and respond to Incidents.

FulcrumDigital Overview

FulcrumDigital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation.

The Role

  • Plan, manage, and oversee all aspects of a Production Environment
  • Define strategies for Application Performance Monitoring, Optimization in Prod environment
  • Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time
  • Ensure batch production scheduling and process are accurate and timely
  • Perform ad hoc requests from users such as data research, file manipulation/transfer, research of process issues, etc
  • Take a holistic approach to problem solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimize meantime to recover
  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement
  • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews
  • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocity
  • Work with a global team spread across tech hubs in multiple geographies and time zones
  • Ability to share knowledge and explain processes and procedures to others

Requirements

  • Experience in Linux
  • Knowledge on ITSM/ITIL
  • Good to have experience in industry standard CI/CD tools like Git/BitBucket, Jenkins, Chef
  • Experience with scripting, pipeline management, and software design
  • Solid grasp on any Databases - Casandra/Postgres/Oracle
  • Strong fundamentals in writing SQL queries
  • Experience in PCF (Pivotal Cloud Foundary)
  • Knowledge in Kafka
  • Knowledge in using any Monitoring tools - DynaTrace/Splunk/Grafana
  • Support experience for Event Framework/Event Drive Applications/Java/J2EE/Spring/Springboot based applications, cloud-based microservices
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
  • Ability to help debug and optimize code and automate routine tasks
  • Ability to support many different stakeholders
  • Experience in dealing with difficult situations and making decisions
  • Appetite for change and pushing the boundaries of what can be done with automation
  • Experience in working across development, operations, and product teams to prioritize needs and to build relationships
  • Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort
  • Good Handle on Change Management and Release Management aspects of Software

;