Search by job, company or skills

Unison Group New Zealand

Site Reliability Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 6 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

  • Design and implement resilient system architectures that support high availability and scalability
  • Develop automation tools and scripts to enhance operational efficiency and reduce manual effort
  • Define, track, and analyze SLOs and SLIs to ensure reliability and performance meet business needs
  • Conduct thorough post-mortem analyses following incidents, driving continuous improvement through root cause identification and solution implementation
  • Collaborate with development and operations teams to establish best practices in system reliability and incident management
  • Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines)
  • Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery
  • Identify and troubleshoot performance bottlenecks across systems, providing actionable recommendations for enhancements
  • Maintain detailed documentation of processes and incident responses to support knowledge sharing and compliance

Requirements

  • Proficiency in programming languages such as Python, Golang, Java, or similar, focusing on operational efficiency
  • Demonstrated experience in system architecture and design, prioritizing reliability, and scalability
  • Strong understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post-mortems
  • Experience with cloud environments (e.g., AWS, Azure, Google Cloud) and their operational management
  • Strong expertise in Linux system administration
  • Proven experience in troubleshooting application support issues with a focus on performance and connectivity
  • Familiarity with networking concepts and effective troubleshooting techniques
  • Excellent problem-solving abilities and a proactive approach to operational challenges
  • Ability to work independently while effectively collaborating within a team environment

Preferred Skills:

  • Familiarity with monitoring tools and performance optimization techniques
  • Experience in scripting or automation for system administration tasks
  • Knowledge of networking concepts and troubleshooting methodologies
  • Hands-on knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and their services
  • Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146944987

Similar Jobs