Our client, a leading organization within the financial services sector, is seeking a Head of Technology Resilience Operations. The Head of Technology Resilience Operations will be responsible for the stability, reliability, and overall health of the bank's live production environment, spanning infrastructure, cybersecurity, applications, networks, and core banking platforms.
This role leads enterprise incident response, enhances observability and predictive capabilities, and strengthens production change governance to minimise operational risk and service impact. The position plays a critical role in ensuring uninterrupted operations and safeguarding the end‑customer experience.
Responsibilities:
Incident Response & Operational Resilience
- Own and govern the end‑to‑end incident lifecycle, from detection and escalation through root cause analysis and resolution
- Coordinate across technology teams, vendors, and business units to restore services within defined Recovery Time Objectives (RTOs) and RMiT requirements
- Drive structured post‑incident reviews to capture lessons learned and continuously improve resilience
Observability & Service Performance
- Design and implement enterprise‑wide observability frameworks across infrastructure, applications, networks, core banking, and cloud platforms
- Establish advanced monitoring and alerting capabilities for proactive and forward‑looking insights
- Deliver real‑time dashboards and performance reporting for technology and business leadership
Predictive Intelligence & Automation
- Build and mature predictive analytics capabilities to anticipate disruptions before they impact operations
- Apply AI and machine learning to detect anomalies, accelerate root cause analysis, and automate operational workflows
- Use predictive insights to support compliance with current and future RMiT regulatory requirements
Production Environment Ownership
- Act as the single point of accountability for production environment stability and integrity
- Define and enforce governance over release schedules, change windows, and parallel deployments
- Ensure seamless operations across on‑premise and cloud‑based environments
Governance, Vendors & Continuous Improvement
- Lead crisis war rooms and post‑incident reviews with senior management
- Maintain strong governance over vendors supporting incident management and observability tooling
- Champion operational excellence through best practices such as ITIL and Site Reliability Engineering (SRE)
Key Stakeholders
- CIO, CTO, and Technology Operations Leadership
- Chief Information Security Officer
- Heads of Application Development
- Infrastructure and Network Leadership
- Risk Management and Business Continuity teams
- External technology and service providers
Ideal Candidate Profile
- 12+ years of experience in large‑scale technology operations, preferably within banking or financial services
- Proven leadership managing high‑availability, business‑critical production environments
- Strong expertise across observability platforms (APM, SIEM, log analytics, cloud monitoring)
- Demonstrated crisis management and executive‑level incident leadership capabilities
- Solid understanding of ITIL, DevOps, and SRE practices
- Excellent communication, stakeholder engagement, and vendor management skills