Head of Technology Resilience Operations

bell ward malaysia

Malaysia, Kuala Lumpur

12-14 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Our client, a leading organization within the financial services sector, is seeking a Head of Technology Resilience Operations. The Head of Technology Resilience Operations will be responsible for the stability, reliability, and overall health of the bank's live production environment, spanning infrastructure, cybersecurity, applications, networks, and core banking platforms.

This role leads enterprise incident response, enhances observability and predictive capabilities, and strengthens production change governance to minimise operational risk and service impact. The position plays a critical role in ensuring uninterrupted operations and safeguarding the end‑customer experience.

Responsibilities:

Incident Response & Operational Resilience

Own and govern the end‑to‑end incident lifecycle, from detection and escalation through root cause analysis and resolution
Coordinate across technology teams, vendors, and business units to restore services within defined Recovery Time Objectives (RTOs) and RMiT requirements
Drive structured post‑incident reviews to capture lessons learned and continuously improve resilience

Observability & Service Performance

Design and implement enterprise‑wide observability frameworks across infrastructure, applications, networks, core banking, and cloud platforms
Establish advanced monitoring and alerting capabilities for proactive and forward‑looking insights
Deliver real‑time dashboards and performance reporting for technology and business leadership

Predictive Intelligence & Automation

Build and mature predictive analytics capabilities to anticipate disruptions before they impact operations
Apply AI and machine learning to detect anomalies, accelerate root cause analysis, and automate operational workflows
Use predictive insights to support compliance with current and future RMiT regulatory requirements

Production Environment Ownership

Act as the single point of accountability for production environment stability and integrity
Define and enforce governance over release schedules, change windows, and parallel deployments
Ensure seamless operations across on‑premise and cloud‑based environments

Governance, Vendors & Continuous Improvement

Lead crisis war rooms and post‑incident reviews with senior management
Maintain strong governance over vendors supporting incident management and observability tooling
Champion operational excellence through best practices such as ITIL and Site Reliability Engineering (SRE)

Key Stakeholders

CIO, CTO, and Technology Operations Leadership
Chief Information Security Officer
Heads of Application Development
Infrastructure and Network Leadership
Risk Management and Business Continuity teams
External technology and service providers

Ideal Candidate Profile

12+ years of experience in large‑scale technology operations, preferably within banking or financial services
Proven leadership managing high‑availability, business‑critical production environments
Strong expertise across observability platforms (APM, SIEM, log analytics, cloud monitoring)
Demonstrated crisis management and executive‑level incident leadership capabilities
Solid understanding of ITIL, DevOps, and SRE practices
Excellent communication, stakeholder engagement, and vendor management skills