SRE
Your new company
We are a leading global provider of digital customer experience and outsourcing solutions, supporting major technology, ecommerce, fintech, and digital platform brands through customer support, technical assistance, content moderation, sales enablement, and digital operations. Our Malaysia hub operates in a dynamic, multilingual environment and is part of a wider awardwinning network recognized for innovation, service quality, and strong peoplefocused culture. We offer modern workspaces, competitive compensation, and clear careergrowth pathways, guided by a talentfirst philosophy that empowers employees to thrive in the digital economy.
Your new role
- Architect, develop, and support resilient, scalable, and faulttolerant infrastructure, working closely with engineering teams to ensure systems are built for performance and reliability.
- Create and refine automation workflows to boost efficiency, reduce manual effort, and streamline recurring operational tasks.
- Track and evaluate system performance to detect and resolve issues proactively, ensuring the platform can sustain rapid increases in traffic and machinelearning workloads.
- Participate in a 24/7 oncall schedule (including planned shifts and holidays), respond responsibly to incidents, perform rootcause analysis, and lead blameless postmortems to avoid repeat failures.
- Deploy and manage monitoring frameworks (SLIs/SLOs/SLAs), along with automated alerts and metrics to maintain visibility into system health.
- Apply and uphold security best practices, ensuring all infrastructure complies with relevant regulatory and compliance standards.
What you'll need to succeed
- Education: A Bachelor's or Master's degree in Computer Science, IT, Computer Engineering, or a closely related discipline.
- Experience: At least 3 years of hands-on experience in roles such as Site Reliability Engineer, Systems Engineer, or Software Engineer.
- Coding: Skilled in one or more high-level programming languages (such as Python, Go, C++, or Java) along with shell scripting, with a solid grasp of algorithms and data structures.
- Systems: Strong proficiency in Linux environments, opensource technologies, and a solid foundation in network architecture.
- Databases: Good understanding of relational databases and data modeling concepts.
- Containers: Practical experience working with containerization and orchestration tools like Docker and Kubernetes.
- Machine Learning: Familiarity with or exposure to ML frameworks such as TensorFlow, PyTorch, MXNet, or PaddlePaddle.
- Monitoring: Hands-on experience using monitoring solutions and methodologies, including tools like Prometheus and Grafana.
- Soft Skills: Able to think strategically, communicate clearly, and collaborate effectively with cross-functional teams in a fast-paced setting.
What you need to do now
If you're interested in this role, click apply now to forward an up-to-date copy of your CV, or call us now.
If this job isn't quite right for you, but you are looking for a new position, please contact us for a confidential discussion on your career.