Search by job, company or skills

Great Eastern Sun

Lead, Service Reliability (Incident & Problem Management

Save
  • Posted 15 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Job

The Lead, Service Reliability – Incident & Problem Management is accountable for the enterprise-wide reliability, resilience, and operational stability of IT Infrastructure and platform services. This role owns the end-to-end Incident and Problem Management capability, ensuring rapid service restoration, elimination of systemic issues, and continuous improvement aligned to business, risk, and regulatory expectations.

Operating as the authoritative lead during major incidents, this role provides decisive leadership, clear direction, and executive‑level communication during high‑impact events. In addition, the role oversees a 24x7 IT Command Center operation, ensuring continuous monitoring, effective triage, and mature escalation practices.

As a senior leader in IT Service Operations, this role is responsible for driving the maturity of operational capabilities through AIOps, automation, and shift‑left strategies, evolving the organization from reactive operations to predictive and preventive service reliability, while maintaining strong governance and operational risk controls

Key Responsibilities

Incident Management

  • Own and lead the Incident Management process end‑to‑end, ensuring rapid and effective restoration of services.
  • Act as Incident Commander for Major Incidents (P0/P1/P2), providing leadership, prioritization, and decision‑making authority.
  • Coordinate cross‑functional response across Infrastructure, Applications, IT Security, Vendors, and Business stakeholders.
  • Ensure adherence to SLAs, OLAs, and experience‑based KPIs, with a focus on reducing business impact.
  • Ensure accurate incident records, post‑incident reviews, and executive‑level reporting.

Problem Management

  • Own the Problem Management capability, ensuring strong discipline in root cause analysis and permanent remediation.
  • Lead problem review forums and drive accountability for corrective and preventive actions across engineering teams.
  • Reduce recurring incidents and technical debt through systemic problem elimination.
  • Maintain and mature the problem knowledge base to support knowledge‑driven and shift‑left operations.
  • Analyze incident and problem trends to proactively identify risks and resilience gaps.

AIOps & Capability Maturity

  • Define and own the AIOps capability maturity roadmap for IT Service Operations, aligned to service reliability, cost efficiency, and customer experience outcomes.
  • Drive the evolution from alert‑driven operations to intelligent event correlation, noise reduction, and predictive insights.
  • Partner with Infrastructure, Application, and IT Security teams to integrate telemetry, logs, metrics, and topology data into operational intelligence platforms.
  • Govern the use of AI‑assisted detection, triage, prioritization, and root cause analysis, ensuring accuracy, explainability, and trust.
  • Drive adoption of automation and self‑healing capabilities, with appropriate risk controls and human‑in‑the‑loop governance.
  • Establish and track success metrics for AIOps adoption (e.g. MTTR improvement, alert noise reduction, incident avoidance).

Process Governance & Continuous Improvement

  • Define, enforce, and annually review Incident and Problem Management policies, standards, and procedures.
  • Monitor process and operational performance, identifying improvement opportunities through data‑driven insights.
  • Drive continuous improvement initiatives across Incident, Problem, Command Center, and automation practices.
  • Provide training and guidance to internal teams and partners on best practices and expected behaviors.
  • Act as the primary point of contact for audits and reviews related to Incident and Problem Management.

24x7 Command Center Oversight

  • Provide leadership oversight of the 24x7 Command Center, ensuring effective monitoring, triage, and escalation.
  • Evolve the Command Center into an AI‑augmented operations capability, leveraging AIOps for decision support and prioritization.
  • Ensure runbooks, playbooks, and escalation frameworks are well defined and consistently executed.

Stakeholder & Risk Management

  • Act as the primary escalation point for Incident and Problem Management to senior management and business stakeholders.
  • Communicate clearly and effectively during critical events, ensuring transparency and timely updates throughout the incident lifecycle.
  • Partner closely with Group Risk Management, Audit, and Compliance to ensure strong operational risk management.
  • Provide structured post‑incident reporting, translating technical issues into business impact and risk considerations.

Leadership & Culture

  • Champion operational excellence, accountability, and resilience‑first thinking across IT Service Operations.
  • Lead, coach, and develop teams to operate effectively under pressure and during high‑impact events.
  • Promote a culture of continuous improvement, learning, and collaboration across internal teams and vendors.

We are looking for people who

  • Bachelor's or Professional Degree in IT, Computer Science, or equivalent.
  • 10+ years of experience in IT Service Management, with deep expertise in Incident and Problem Management.
  • 5+ years in a senior leadership or lead role within IT Operations or Service Management.
  • Proven experience leading Major Incidents in complex, enterprise environments.
  • Strong hands‑on experience with ServiceNow or equivalent ITSM platforms.
  • Strong understanding and practical application of ITIL practices.
  • Experience operating in 24x7, mission‑critical environments.
  • Demonstrated experience driving or governing AIOps and automation adoption, beyond tool implementation.
  • Familiarity with Generative AI use cases for IT Operations (e.g. incident summarisation, assisted RCA).

How you succeed

  • Champion and embody our Core Values in everyday tasks and interactions.
  • Demonstrate high level of integrity and accountability.
  • Take initiative to drive improvements and embrace change.
  • Take accountability of business and regulatory compliance risks, implementing measures to mitigate them effectively.
  • Keep abreast with industry trends, regulatory compliance, and emerging threats and technologies to understand and highlight potential concerns/ risks to safeguard our company proactively.

Who we are

Founded in 1908, Great Eastern is a well-established market leader and trusted brand in Singapore and Malaysia. With over S$100 billion in assets and more than 16 million policyholders, including 12.5 million from government schemes, it provides insurance solutions to customers through three successful distribution channels – a tied agency force, bancassurance, and financial advisory firm Great Eastern Financial Advisers. The Group also operates in Indonesia and Brunei.

The Great Eastern Life Assurance Company Limited and Great Eastern General Insurance Limited have been assigned the financial strength and counterparty credit ratings of AA- by S&P Global Ratings since 2010, one of the highest among Asian life insurance companies. Great Eastern's asset management subsidiary, Lion Global Investors Limited, is one of the leading asset management companies in Southeast Asia.

Great Eastern is a subsidiary of OCBC, the longest established Singapore bank, formed in 1932. It is the second largest financial services group in Southeast Asia by assets and one of the world's most highly-rated banks, with an Aa1 rating from Moody's and AA- by both Fitch and S&P. Recognised for its financial strength and stability, OCBC is consistently ranked among the World's Top 50 Safest Banks by Global Finance and has been named Best Managed Bank in Singapore by The Asian Banker.

To all recruitment agencies: Great Eastern does not accept unsolicited agency resumes. Please do not forward resumes to our email or our employees. We will not be responsible for any fees related to unsolicited resumes.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148943445

Similar Jobs

Malaysia, Kuala Lumpur

Skills:

Problem-solvingPartnershipsCommercial NegotiationAccount ManagementDigital PaymentsCross-Functional CollaborationStakeholder ManagementBusiness DevelopmentMarket Expansion

Malaysia, Kuala Lumpur

Skills:

Machine LearningPower BiTableauGrafanatelemetry pipelinestelemetry data analyticsCI CD integrationdata analysis automationAidata instrumentation standards

Kuala Lumpur

Skills:

Application DevelopmentApplication DesignInsurance DomainEsbWeb ServicesJsonSoapOracleSqlASP.NetReact NativeProject Delivery ManagementIncident ManagementSdlcAgileSystem IntegrationInsurance Distribution ChannelsStakeholder ManagementProduction SupportTeam LeadershipTroubleshootingVendor ManagementAnalytical SkillsCommunication SkillsMandarin

Malaysia, Kuala Lumpur

Skills:

Performance TestingAsset Managementrefiningtest automation frameworksHseTradingSupply Chaintest management for large enterprise systemsoil gas business processes

Kuala Lumpur

Skills:

onsite coordination SAP SecuritySap GrcIncident ManagementProblem ManagementChange ManagementItilSAP GRC Access ControlCompliance ManagementRisk AnalysisAccess ManagementSAP Security GovernanceSAP Landscape SecuritySAP HardeningStakeholder ManagementProject SupportEnhancement SupportContinuous ImprovementProcess OptimizationAudit Compliance