Search by job, company or skills

Great Eastern Sun

Lead, Service Reliability (Incident & Problem Management

new job description bg glownew job description bg glownew job description bg svg
  • Posted 7 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Job

The Lead, Service Reliability Incident & Problem Management is accountable for the enterprise-wide reliability, resilience, and operational stability of IT Infrastructure and platform services. This role owns the end-to-end Incident and Problem Management capability, ensuring rapid service restoration, elimination of systemic issues, and continuous improvement aligned to business, risk, and regulatory expectations.

Operating as the authoritative lead during major incidents, this role provides decisive leadership, clear direction, and executivelevel communication during highimpact events. In addition, the role oversees a 24x7 IT Command Center operation, ensuring continuous monitoring, effective triage, and mature escalation practices.

As a senior leader in IT Service Operations, this role is responsible for driving the maturity of operational capabilities through AIOps, automation, and shiftleft strategies, evolving the organization from reactive operations to predictive and preventive service reliability, while maintaining strong governance and operational risk controls

Key Responsibilities

Incident Management

  • Own and lead the Incident Management process endtoend, ensuring rapid and effective restoration of services.
  • Act as Incident Commander for Major Incidents (P0/P1/P2), providing leadership, prioritization, and decisionmaking authority.
  • Coordinate crossfunctional response across Infrastructure, Applications, IT Security, Vendors, and Business stakeholders.
  • Ensure adherence to SLAs, OLAs, and experiencebased KPIs, with a focus on reducing business impact.
  • Ensure accurate incident records, postincident reviews, and executivelevel reporting.

Problem Management

  • Own the Problem Management capability, ensuring strong discipline in root cause analysis and permanent remediation.
  • Lead problem review forums and drive accountability for corrective and preventive actions across engineering teams.
  • Reduce recurring incidents and technical debt through systemic problem elimination.
  • Maintain and mature the problem knowledge base to support knowledgedriven and shiftleft operations.
  • Analyze incident and problem trends to proactively identify risks and resilience gaps.

AIOps & Capability Maturity

  • Define and own the AIOps capability maturity roadmap for IT Service Operations, aligned to service reliability, cost efficiency, and customer experience outcomes.
  • Drive the evolution from alertdriven operations to intelligent event correlation, noise reduction, and predictive insights.
  • Partner with Infrastructure, Application, and IT Security teams to integrate telemetry, logs, metrics, and topology data into operational intelligence platforms.
  • Govern the use of AIassisted detection, triage, prioritization, and root cause analysis, ensuring accuracy, explainability, and trust.
  • Drive adoption of automation and selfhealing capabilities, with appropriate risk controls and humanintheloop governance.
  • Establish and track success metrics for AIOps adoption (e.g. MTTR improvement, alert noise reduction, incident avoidance).

Process Governance & Continuous Improvement

  • Define, enforce, and annually review Incident and Problem Management policies, standards, and procedures.
  • Monitor process and operational performance, identifying improvement opportunities through datadriven insights.
  • Drive continuous improvement initiatives across Incident, Problem, Command Center, and automation practices.
  • Provide training and guidance to internal teams and partners on best practices and expected behaviors.
  • Act as the primary point of contact for audits and reviews related to Incident and Problem Management.

24x7 Command Center Oversight

  • Provide leadership oversight of the 24x7 Command Center, ensuring effective monitoring, triage, and escalation.
  • Evolve the Command Center into an AIaugmented operations capability, leveraging AIOps for decision support and prioritization.
  • Ensure runbooks, playbooks, and escalation frameworks are well defined and consistently executed.

Stakeholder & Risk Management

  • Act as the primary escalation point for Incident and Problem Management to senior management and business stakeholders.
  • Communicate clearly and effectively during critical events, ensuring transparency and timely updates throughout the incident lifecycle.
  • Partner closely with Group Risk Management, Audit, and Compliance to ensure strong operational risk management.
  • Provide structured postincident reporting, translating technical issues into business impact and risk considerations.

Leadership & Culture

  • Champion operational excellence, accountability, and resiliencefirst thinking across IT Service Operations.
  • Lead, coach, and develop teams to operate effectively under pressure and during highimpact events.
  • Promote a culture of continuous improvement, learning, and collaboration across internal teams and vendors.

We are looking for people who

  • Bachelor's or Professional Degree in IT, Computer Science, or equivalent.
  • 10+ years of experience in IT Service Management, with deep expertise in Incident and Problem Management.
  • 5+ years in a senior leadership or lead role within IT Operations or Service Management.
  • Proven experience leading Major Incidents in complex, enterprise environments.
  • Strong handson experience with ServiceNow or equivalent ITSM platforms.
  • Strong understanding and practical application of ITIL practices.
  • Experience operating in 24x7, missioncritical environments.
  • Demonstrated experience driving or governing AIOps and automation adoption, beyond tool implementation.
  • Familiarity with Generative AI use cases for IT Operations (e.g. incident summarisation, assisted RCA).

How you succeed

  • Champion and embody our Core Values in everyday tasks and interactions.
  • Demonstrate high level of integrity and accountability.
  • Take initiative to drive improvements and embrace change.
  • Take accountability of business and regulatory compliance risks, implementing measures to mitigate them effectively.
  • Keep abreast with industry trends, regulatory compliance, and emerging threats and technologies to understand and highlight potential concerns/ risks to safeguard our company proactively.

Who we are

Founded in 1908, Great Eastern is a well-established market leader and trusted brand in Singapore and Malaysia. With over S$100 billion in assets and more than 16 million policyholders, including 12.5 million from government schemes, it provides insurance solutions to customers through three successful distribution channels a tied agency force, bancassurance, and financial advisory firm Great Eastern Financial Advisers. The Group also operates in Indonesia and Brunei.

The Great Eastern Life Assurance Company Limited and Great Eastern General Insurance Limited have been assigned the financial strength and counterparty credit ratings of AA- by S&P Global Ratings since 2010, one of the highest among Asian life insurance companies. Great Eastern's asset management subsidiary, Lion Global Investors Limited, is one of the leading asset management companies in Southeast Asia.

Great Eastern is a subsidiary of OCBC, the longest established Singapore bank, formed in 1932. It is the second largest financial services group in Southeast Asia by assets and one of the world's most highly-rated banks, with an Aa1 rating from Moody's and AA- by both Fitch and S&P. Recognised for its financial strength and stability, OCBC is consistently ranked among the World's Top 50 Safest Banks by Global Finance and has been named Best Managed Bank in Singapore by The Asian Banker.

To all recruitment agencies: Great Eastern does not accept unsolicited agency resumes. Please do not forward resumes to our email or our employees. We will not be responsible for any fees related to unsolicited resumes.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145206379

Similar Jobs