Search by job, company or skills

Nasstar

Service Stability Lead

Save
new job description bg glownew job description bg glow
  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Application Deadline: 29 May 2026

Department: Connectivity

Location: Cyberjaya

Description

The Service Stability Lead owns the end‑to‑end management of high‑severity incidents and underlying problems, ensuring rapid service restoration and permanent resolution of root causes.

The role combines real‑time incident leadership with proactive and reactive problem management to ensure that:
  • Major Incidents are effectively led, controlled, and communicated
  • Root causes are identified, understood, and eliminated
  • Trends and risks are proactively identified and mitigated
  • Improvements are implemented through to completion via Change Management
The role acts as the operational lead during Major Incidents and the owner of service stability, ensuring incidents are resolved quickly and do not recur.

Key Responsibilities

Major Incident Management (P1 / MI / MSI)
  • Lead and coordinate all Critical Incidents to drive rapid service restoration.
  • Act as the single point of control during incidents, directing resolver groups, technical teams, and stakeholders
  • Chair incident bridge calls and maintain pace, direction, and accountability
  • Ensure clear, structured, and timely communications, aligned to customer and business expectations
  • Maintain primary focus on service restoration, with structured follow‑up for root cause analysis
  • Empowered to:
  • Drive prioritization and actions during Major Incidents
  • Challenge delays or inadequate responses
  • Escalate where required to protect service and customer outcomes
  • Influence technical and operational decision‑making to protect service and customer outcomes.
Problem Management (End-to-End Ownership)
  • Own the end‑to‑end Problem Management lifecycle from identification through to closure
  • Ensure all Major Incidents transition into Problem records where required
  • Drive and quality assure Root Cause Analysis (RCA) using structured methodologies
  • Ensure all outputs are clear, fit for purpose (including customer‑facing where required), actionable, outcome‑driven, and tracked through to completion.
  • Produce and govern:
  • Post-Incident Reviews (PIRs)
  • Service Incident Reports (SIRs)
  • Root Cause Analysis Reports (RCAs)
End-to-End Lifecycle Ownership (Incident → Problem → Change)
  • Ensure clear linkage and traceability between:
  • Incidents
  • Problems
  • Known Errors
  • Change
  • Track all remediation actions through to successful implementation via Change Management
  • Prevent RCA without resolution by enforcing accountability for delivery of permanent fixes.
  • Work closely with Change Management to ensure fixes are:
  • Prioritized appropriately
  • Implemented safely
  • Delivering intended outcomes
Continuous Improvement, Risk & Prevention
  • Analyze incident data to identify:
  • Recurring issues and trends
  • Systemic weaknesses
  • Service risks (including legacy or accepted risks)
  • Define and improve:
  • Problem Management methodologies
  • KPIs and reporting frameworks
  • Preventative controls
  • Identify and drive improvements in:
  • Monitoring and alerting
  • Early detection capabilities
  • Service resilience
  • Apply proportionate operational approaches (e.g. streamlined handling for known or repeat issues) to balance efficiency with effectiveness
Communication, Reporting & Insight
  • Develop and deliver insight-led reporting, including:
  • P1 / MI / MSI trends
  • Root cause categorization
  • Service partner performance
  • Recurrence and stability metrics
  • Provide clear, insight‑led narratives for SLT, linking incidents to customer impact, root cause, and improvement actions
  • Ensure all communications are:
  • Clear, concise, and consistent
  • Aligned to agreed standards and terminology
Process, Governance & Tooling
  • Ensure adherence to Nasstar Major Incident and Problem Management processes
  • Provide guidance, coaching, and support to operational teams
  • Contribute to the development of:
  • Processes
  • Templates
  • Documentation
  • Reporting frameworks
  • Drive improvements in ServiceNow, including:
  • Incident / Major Case / Problem linkage
  • Communication plans and updates
  • Data quality and reporting integrity

Skills, Knowledge and Expertise


  • ITIL Foundation (v4) or equivalent experience
  • Proven experience managing Major Incidents in a 24/7 environment
  • Experience owning Problem Management lifecycle end-to-end
  • Strong understanding of ITIL processes (Incident, Problem, Change)
  • Experience within an MSP / Managed Services environment
  • Experience driving root cause analysis and service improvement initiatives
  • Strong incident leadership and decision-making under pressure
  • Excellent stakeholder and customer communication skills
  • Strong facilitation skills (incident calls, RCA sessions)
  • Technical awareness to challenge and guide resolver teams
  • Proactive, organized, and focused on outcomes and service stability

Benefits

  • Competitive salary based on experience.
  • Opportunity to work with a dynamic international team.
  • Training and development provided.
  • Career progression opportunities within different divisions/department

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 148287395