Search by job, company or skills

AIA Group

System Reliability Engineer, Consultant

new job description bg glownew job description bg glownew job description bg svg
  • Posted 9 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

At AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.

If you believe in developing a better tomorrow, read on.

About the Role

We are looking for a System / Site Reliability Engineer (SRE) to help ensure the reliability, scalability, and performance of our enterprise systems and services. In this role, you will apply software engineering principles to operations, partner closely with development and infrastructure teams, and build automation that strengthens system stability and efficiency. You will play a pivotal role in bridging the gap between software development and IT operations, driving a culture of resilience, observability, automation, and proactive problemsolving.

Key Responsibilities

1. Ensure System Reliability & Availability

  • Monitor and report on application performance, and highlight any deviations or issues.

  • Collaborate with application engineers and developers to identify root causes and implement durable fixes.

2. Incident Management & Root Cause Analysis

  • Participate as a Subject Matter Advisor during production incidents and outages.

  • Provide insights backed by system monitoring, code review, and database analysis.

  • Support postmortem reviews and drive followup actions.

3. Automation & Tooling

  • Automate operational tasks such as monitoring, alerts, and recovery processes.

  • Build scripts and internal tools to eliminate manual toil and improve operational efficiency.

4. Monitoring & Observability

  • Implement telemetry and observability practices to track system health, latency, and error rates.

  • Manage the Dynatrace platform and its integrations with application services.

  • Support teams in designing dashboards and visualization setups.

5. Security & Compliance

  • Work with Security teams to ensure systems comply with regulatory and industry standards (e.g., PCIDSS, GDPR).

  • Implement necessary access controls, encryption, and audit capabilities within SRE scope.

6. Capacity Planning & Performance Optimization

  • Analyze usage trends to forecast demand and support scaling decisions.

  • Contribute to costperformance optimization efforts across infrastructure and applications.

  • Collaborate closely with development, QA, and infrastructure teams to embed reliability into the SDLC.

7. Documentation & Knowledge Sharing

  • Maintain clear and uptodate operational documentation, runbooks, and architecture diagrams.

  • Champion SRE principles across the organization to foster resilience and accountability.

Job Requirements

Education

  • Bachelor's degree in Computer Science, Software Engineering, IT, or related fields.

Experience

  • 3-5 years of experience in SRE, DevOps, or Software Engineering roles.

  • Experience supporting frontend applications in production environments, ideally within financial services or other regulated industries.

Technical Skills

  • Strong understanding of frontend performance monitoring and instrumentation.

  • Handson experience with Real User Monitoring (RUM), Synthetic Monitoring, and APM tools (e.g., Dynatrace, New Relic, Datadog).

  • Proficiency in building dashboards and alerts using Dynatrace, Grafana, Prometheus, Elastic Stack, or Splunk.

  • Familiarity with OpenTelemetry for distributed tracing.

  • Scripting skills in Python, Bash, or JavaScript.

  • Experience with CI/CD pipelines (e.g., GitHub Flow).

  • Practical experience with cloud technologies (AWS or Azure).

  • Knowledge of Docker and Kubernetes.

  • Understanding of secure coding practices for frontend applications.

  • Awareness of financial compliance standards such as PCIDSS.

Why Join Us

  • Be part of a highimpact team shaping system resilience across the enterprise.

  • Work with modern observability and automation technologies.

  • Influence engineering culture through SRE best practices.

  • Opportunities to innovate and drive real improvements in system reliability.

About Company

AIA Group Limited, often known as AIA , is a Hong Kong-based American multinational insurance and finance corporation. It is the largest public listed life insurance and securities group in Asia-Pacific. It offers insurance and financial services, writing life insurance for individuals and businesses, as well as accident and health insurance, and offers retirement planning, and wealth management services, variable contracts, investments and securities.

Job ID: 144086533

Similar Jobs