Senior Site Reliability Engineer (SRE)

Horizontal Talent

Malaysia, Kuala Lumpur

5-7 Years

Save

Posted 2 months ago
Be among the first 10 applicants

Early Applicant

Job Description

About Horizontal: Established since 2003 in the US, Horizontal solves complex challenges across two distinct businesses: Horizontal Digital and Horizontal Talent. We are consistently recognized for being a top workplace and one of the fastest growing private companies. Horizontal Talent specializes in staffing for IT, Digital & Creative and Business & Strategy markets. We have global offices in US, UAE, India, Malaysia and Australia.

About The Role

As a Senior SRE, you'll drive the development and execution of strategies for DevSecOps practices and platform. Your work will ensure seamless collaboration between technology teams, enabling fast and reliable high-quality software delivery.

You'll work with a team responsible for implementing and managing Infrastructure as Code (IaC), CI/CD pipelines, cloud native & micro-services, automation frameworks, and release management processes, ensuring they align with organizational objectives.

What You'll Do

Lead the design and implementation of highly available, secure, and scalable banking infrastructure using infrastructure as code (IaC) principles
Establish and maintain SLOs/SLIs that define our reliability standards and drive accountability across engineering teams
Serve as an incident commander during critical service disruptions, leading cross-functional response teams with calm expertise
Build and enhance our observability platform, enabling real-time monitoring of our golden signals (uptime, latency, saturation, error rate)
Develop automation solutions for incident response, disaster recovery, and business continuity
Drive our DevSecOps platform to enable safe, rapid deployments through CI/CD, GitOps, and self-service capabilities
Lead FinOps initiatives to bring visibility and drive ownership amongst tech teams to optimize infrastructure utilization while maintaining performance and reliability
Mentor junior engineers and contribute to a culture of operational excellence

What We're Seeking

Demonstrated experience of at least 5 years in Site Reliability Engineering, DevOps, or equivalent roles.
Strong understanding of cloud technologies (AWS, Azure, GCP, Alibaba Cloud)
Experience implementing CI/CD pipelines and GitOps workflows
Deep expertise with infrastructure as code tools (Hashicorp Terraform, OpenTofu, CloudFormation, or similar)
Proven ability to design and implement observability solutions using modern monitoring stacks
Experience leading incident response and building post-mortem processes
Strong understanding of Java or any other object-oriented programming language (OOP).
Strong understanding of containerization & orchestration.
Experience with messaging systems such as Kafka is an added advantage.
Familiarity with relational and non-relational databases is a plus.
Ability to balance hands-on technical expertise with strategic decision-making.
Strong problem-solving skills and the ability to make sound decisions under pressure.
A passion for continuous learning, innovation, and professional development.
High ownership of responsibilities, with a focus on delivering results and meeting deadlines.
Financial services experience is a plus but not required