Join EPAM Malaysia as a Senior Application Support Engineer, where you will own the performance, reliability, and scalability of mission-critical platforms. You will lead containerized workloads, optimize database operations, and troubleshoot complex production issues, while influencing platform-level decisions and improvements. Partner closely with engineering, operations, and product teams to drive seamless service delivery, implement long-term solutions, and mentor junior engineers. This role is ideal for senior engineers who thrive in high-ownership, hands-on operational environments and want to make a strategic impact within a modern, cloud-native ecosystem.
Responsibilities
- Lead incident response and resolution for high-severity production issues, ensuring minimal downtime and timely communication with stakeholders
- Monitor and optimize application performance, identifying systemic patterns and driving long-term improvements
- Design, implement and maintain automation workflows and operational tooling to reduce manual intervention
- Conduct and lead root cause analysis (RCA) and postmortems, implementing durable corrective measures and preventive strategies
- Maintain and enhance project documentation, runbooks and operational guidelines to ensure knowledge continuity and platform reliability
- Mentor and guide junior engineers, sharing best practices in application support, DevOps and cloud-native operations
- Collaborate with cross-functional teams to influence platform architecture, deployment strategies, and operational excellence initiatives
- Drive continuous improvement initiatives for scalability, availability and operational efficiency across the platform
Requirements
- 7+ years of hands-on experience in software application support, platform operations, or production engineering
- Proven experience operating PostgreSQL in production, including installation, configuration, performance tuning and troubleshooting complex SQL workloads
- Demonstrated expertise with Docker and Kubernetes, including workload management, deployment troubleshooting, Helm chart management and optimization of containerized applications
- Strong proficiency in Linux administration, including logs, system services, permissions, and performance diagnostics
- Hands-on experience with monitoring and observability tools (Prometheus, Grafana, ELK, Loki, or similar)
- Proven ability to own end-to-end troubleshooting, lead RCA and implement durable solutions in high-availability production environments
- Excellent communication and collaboration skills, capable of articulating technical concepts to both technical and non-technical stakeholders
Nice to have
- Experience with scripting (Shell, Python, or similar) for automation and operational efficiency
- Exposure to cloud-native ecosystems and modern DevOps practices
- Familiarity with CI/CD pipelines, automation tools and infrastructure-as-code frameworks
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn