Search by job, company or skills

MetLife

Observability, Automation & AI Ops Engineer - MetLife HACK4JOB

new job description bg glownew job description bg glownew job description bg svg
  • Posted 9 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

MetLife HACK4JOB - IT Infrastructure Engineering Challenge - Kuala Lumpur

  • Join the #Hack4Job 2026 Hackathon on January 31, 2026, to pursue a rewarding tech career with MetLife.
  • Register by January 19, 11:59pm. MYT

Join us for MetLife Hack4Job, an exciting platform where innovation meets opportunity! This challenge is designed for curious and talented infrastructure engineers who want to push their limits and showcase their technical expertise.

By participating, you'll tackle complex scenarios, demonstrate your skills in automation, cloud, and security, and gain visibility among industry leaders. It's more than a competitionit's your chance to learn, grow, and make an impact. Are you ready to test yourself and build for tomorrow

Job Role: Observability, Automation & AI Ops Engineer

The Observability, Automation & AI Ops Engineer is responsible for designing, implementing, and optimizing advanced monitoring, automation, and AI-driven operations solutions across MetLife's hybrid cloud and on-premises environments. This role ensures high availability, reliability, and efficiency of IT services by leveraging modern observability platforms, automation frameworks, and artificial intelligence for proactive incident management and continuous improvement.

Key Responsibilities

Observability Engineering

  • Design, deploy, and manage observability platforms (e.g., Elastic, Splunk, Prometheus, Grafana, OpenTelemetry) for end-to-end visibility of applications, infrastructure, and business services.
  • Develop and maintain telemetry pipelines for logs, metrics, traces, and events.
  • Build dashboards and automated alerting systems with AI-powered anomaly detection.
  • Collaborate with DevOps, SRE, and application teams to integrate observability into CI/CD pipelines and cloud-native architectures.
  • Analyze system health, identify trends, and drive data-driven decisions for performance optimization and reliability.

Automation Engineering

  • Design, implement, and maintain automation solutions for infrastructure provisioning, configuration management, and operational workflows (e.g., Ansible, Terraform, CI/CD tools).
  • Develop self-healing scripts and intelligent runbooks for automated incident response and remediation.
  • Integrate automation with monitoring and ITSM tools to streamline operations and reduce manual intervention.
  • Lead or participate in automation projects to improve efficiency, reduce errors, and support business agility.
  • Stay current with emerging automation technologies and best practices.

AI Ops Engineering

  • Implement and maintain AI-driven systems for real-time monitoring, predictive analytics, and automated root cause analysis.
  • Develop and train machine learning models using operational data (logs, metrics, events, traces) for anomaly detection and forecasting.
  • Deploy and manage AIOps platforms (e.g., Moogsoft, Dynatrace, DataDog, Elastic) to enable proactive incident management and self-healing capabilities.
  • Collaborate with IT, DevOps, and Data Science teams to integrate AI/ML into IT operations and service management.
  • Monitor and optimize AI model performance, ensuring reliability and continuous improvement.

Technical Leadership & Collaboration

  • (Senior Level) Mentor junior engineers, provide technical guidance, and lead cross-functional project teams.
  • Drive adoption of observability, automation, and AI Ops best practices across the organization.
  • Participate in technology evaluations, pilots, and rollouts of new solutions.

Qualifications & Skills

Experience:

  • Associate: 02 years in observability, automation, or IT operations.
  • Engineer: 25 years relevant experience.
  • Senior: 5+ years with demonstrated technical and/or team leadership.

Skills:

  • Proficiency in observability platforms (Elastic, Splunk, Prometheus, Grafana, OpenTelemetry).
  • Strong experience with automation tools (Ansible, Terraform, CI/CD, scripting languages).
  • Familiarity with AIOps platforms and AI/ML frameworks (Scikit-learn, TensorFlow, PyTorch).
  • Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
  • Excellent troubleshooting, analytical, and communication skills.
  • (Senior Level) Ability to lead, mentor, and manage technical teams.

Preferred Certifications:

  • Relevant certifications in observability, automation, cloud, or AI/ML platforms are a plus.
  • ITIL v4

Language:

  • Business proficiency in English.
  • Proficiency in Japanese will be an added bonus

Why This Role Matters

This role is critical to MetLife's digital transformation, enabling proactive, data-driven IT operations, reducing downtime, and accelerating innovation through automation and AI.

The application for this hackathon is open to individuals from all countries. The job opportunities are based in Kuala Lumpur, Malaysia.

Ready to innovate and showcase your skills Join the MetLife Hack4Job event todayclick Apply and secure your spot!

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 135892739