At AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.
As pioneering innovators for over 100 years, we're now transforming our organisation to be faster, simpler and more connected. Because we want to be even better equipped to develop digital solutions and experiences that help more people live Healthier, Longer, Better Lives.
To get there, we need people with tech/digital/analytics expertise and passion to help develop positive, sustainable change through digitally enhanced experiences that will impact the lives of millions of people and create a healthier future for everyone.
If you believe in developing a better tomorrow, read on.
About The Role
We are looking for a System / Site Reliability Engineer (SRE) to help ensure the reliability, scalability, and performance of our enterprise systems and services. In this role, you will apply software engineering principles to operations, partner closely with development and infrastructure teams, and build automation that strengthens system stability and efficiency. You will play a pivotal role in bridging the gap between software development and IT operations, driving a culture of resilience, observability, automation, and proactive problemsolving.
Key Responsibilities
- Ensure System Reliability & Availability
- Monitor and report on application performance, and highlight any deviations or issues.
- Collaborate with application engineers and developers to identify root causes and implement durable fixes.
- Incident Management & Root Cause Analysis
- Participate as a Subject Matter Advisor during production incidents and outages.
- Provide insights backed by system monitoring, code review, and database analysis.
- Support postmortem reviews and drive followup actions.
- Automation & Tooling
- Automate operational tasks such as monitoring, alerts, and recovery processes.
- Build scripts and internal tools to eliminate manual toil and improve operational efficiency.
- Monitoring & Observability
- Implement telemetry and observability practices to track system health, latency, and error rates.
- Manage the Dynatrace platform and its integrations with application services.
- Support teams in designing dashboards and visualization setups.
- Security & Compliance
- Work with Security teams to ensure systems comply with regulatory and industry standards (e.g., PCIDSS, GDPR).
- Implement necessary access controls, encryption, and audit capabilities within SRE scope.
- Capacity Planning & Performance Optimization
- Analyze usage trends to forecast demand and support scaling decisions.
- Contribute to costperformance optimization efforts across infrastructure and applications.
- Collaborate closely with development, QA, and infrastructure teams to embed reliability into the SDLC.
- Documentation & Knowledge Sharing
- Maintain clear and uptodate operational documentation, runbooks, and architecture diagrams.
- Champion SRE principles across the organization to foster resilience and accountability.
Job Requirements
Education
- Bachelor's degree in Computer Science, Software Engineering, IT, or related fields.
Experience
- 35 years of experience in SRE, DevOps, or Software Engineering roles.
- Experience supporting frontend applications in production environments, ideally within financial services or other regulated industries.
Technical Skills
- Strong understanding of frontend performance monitoring and instrumentation.
- Handson experience with Real User Monitoring (RUM), Synthetic Monitoring, and APM tools (e.g., Dynatrace, New Relic, Datadog).
- Proficiency in building dashboards and alerts using Dynatrace, Grafana, Prometheus, Elastic Stack, or Splunk.
- Familiarity with OpenTelemetry for distributed tracing.
- Scripting skills in Python, Bash, or JavaScript.
- Experience with CI/CD pipelines (e.g., GitHub Flow).
- Practical experience with cloud technologies (AWS or Azure).
- Knowledge of Docker and Kubernetes.
- Understanding of secure coding practices for frontend applications.
- Awareness of financial compliance standards such as PCIDSS.
Why Join Us
- Be part of a highimpact team shaping system resilience across the enterprise.
- Work with modern observability and automation technologies.
- Influence engineering culture through SRE best practices.
- Opportunities to innovate and drive real improvements in system reliability.
Build a career with us as we help our customers and the community live Healthier, Longer, Better Lives.
You must provide all requested information, including Personal Data, to be considered for this career opportunity. Failure to provide such information may influence the processing and outcome of your application. You are responsible for ensuring that the information you submit is accurate and up-to-date.