Search by job, company or skills

Mahindra Satyam

IT Disaster Recovery (DR) Governance

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 17 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Tech Mahindra represents the connected world, offering innovative and customer-centric information technology experiences, enabling Enterprises, Associates, and the Society to Rise. It has 150,000+ professionals working for 1000+ Global Customers (including Fortune 500 companies) in 90 Countries. We're part of the esteemed Mahindra group, headquartered in India. Under a new CEO, Tech Mahindra is committed to a transformative journey with Scale @ Speed as our guiding principle.

Job Description (JD): IT Disaster Recovery (DR) Governance

1) Role Title

IT Disaster Recovery (DR) Governance / DR Governance Lead

(Alternate titles: IT Resilience Governance Manager, DR Program Governance Lead, IT Continuity Governance Lead)

3) Key Responsibilities

A. DR Governance Framework & Standards

  • Define, implement, and maintain the DR governance model (policies, standards, procedures, controls, decision rights).
  • Establish DR lifecycle governance: strategy → design → implementation → test → review → improve.
  • Ensure alignment with enterprise BCM/ITSCM, cybersecurity, risk, compliance, and architecture standards.
  • Maintain DR documentation control: versioning, approvals, evidence retention, and audit-ready repositories.

B. DR Strategy, Scope & Service Criticality

  • Lead business impact alignment with BCM teams to confirm critical services, dependencies, and recovery objectives.
  • Own/maintain the DR scope register (Tiering, service criticality, DR patterns, recovery method, dependencies).
  • Ensure RTO/RPO, recovery sequencing, and minimum service levels are measurable and contract/governance aligned.

C. DR Plans, Runbooks & Readiness

  • Govern creation and upkeep of:
  • DR Plans (service-based and site-based)
  • Technical Runbooks (step-by-step recovery procedures)
  • Dependency maps (apps ↔ infra ↔ network ↔ identity ↔ storage/backup)
  • Communication trees & escalation models for DR events
  • Validate plans are feasible (people, process, technology) and aligned to real operational capabilities.

D. DR Testing Program (Design, Execution & Closure)

  • Build and run the annual/quarterly DR test calendar (table-top, technical failover/failback, partial and full-scale).
  • Chair DR test rehearsals, coordinate participants, define entry/exit criteria, and manage test execution governance.
  • Capture outcomes: test evidence, results, deviations, gaps, and improvement actions.
  • Enforce closure: action owners, target dates, risk acceptance processes, and retest requirements.

E. Risk, Issue & Exception Management

  • Maintain DR risk register and issue log; assess impacts, prioritize remediation, track closure.
  • Govern exceptions/waivers (e.g., RTO/RPO not met): require business justification, risk acceptance, and compensating controls.
  • Drive continual improvement: lessons learned, maturity assessment, and roadmap updates.

F. DR Tooling, Monitoring & Metrics

  • Define DR governance reporting: KRIs/KPIs, dashboards, and executive summaries.
  • Ensure monitoring/telemetry exists for key resilience controls (backup health, replication status, failover readiness).
  • Validate configuration integrity: DR environment parity, patching/versions, access control, and change alignment.

G. Change Management & Release Governance

  • Integrate DR requirements into Change/Release processes:
  • ensure DR impact assessment is mandatory for major changes
  • validate DR plan updates for significant architecture/service changes
  • enforce testing following high-impact deployments
  • Provide governance sign-off for DR-related changes and ensure rollback/failback readiness.

H. Stakeholder & Vendor Management

  • Act as primary governance interface across:
  • Service Owners, Infrastructure/Cloud teams, Network/Security, App teams, Service Desk/ITOps, BCM, Audit/Risk
  • Govern supplier participation in DR tests and ensure contract/SLA clauses are met (evidence and reporting).
  • Facilitate SteerCo / governance forums: prepare packs, decisions, and action tracking.

I. DR Event Governance (During Actual Disaster)

  • Support incident leadership during major disruptions:
  • ensure DR invocation criteria are met and documented
  • coordinate governance communications, logging, approvals, and evidence capture
  • oversee controlled recovery and post-event review

4) Key Deliverables (Audit-Ready)

  • DR Governance Policy, Standards, and Control Framework
  • DR Scope & Tiering Register (service criticality, RTO/RPO, dependency mapping)
  • DR Plans and Technical Runbooks (service/site/platform level)
  • DR Test Strategy, Test Calendar, Test Scripts, Evidence Packs, and Test Reports
  • DR Risk Register, Issue Log, Waiver/Exception Register
  • DR Readiness Dashboard (KRIs/KPIs) and Executive SteerCo Pack
  • Post-Test / Post-Incident Lessons Learned and Improvement Roadmap
  • Annual DR Maturity Assessment & Program Plan

5) Success Measures (KPIs / KRIs)

You can tailor these to your contract/SLA model:

Governance & Coverage

  • % of critical services with approved DR plans/runbooks (target: ≥ 95–100%)
  • % of DR documentation updated within defined cadence (e.g., quarterly)

Testing & Assurance

  • % planned DR tests executed on time (target: ≥ 90–95%)
  • % DR tests meeting RTO/RPO (target: baseline then improve QoQ)
  • of high/critical findings open beyond SLA (target: trending down)

Risk & Control

  • Time to close DR test actions (median days)
  • of active waivers/exceptions and aging
  • Reduction in repeat findings across consecutive tests

Operational Readiness

  • Backup/replication success rates, restore success rates (where measurable)
  • DR environment parity compliance (patch level/version drift)

6) Required Skills & Competencies

Core Knowledge

  • End-to-end DR/IT Service Continuity governance and execution
  • Recovery design patterns: active-active, active-passive, warm standby, cold standby, backup/restore, pilot light
  • Strong understanding of enterprise IT: compute/virtualization, storage/backup, network, IAM/AD, databases, cloud services

Governance & Delivery Skills

  • Program governance, stakeholder management, and executive communication
  • Strong documentation discipline and audit evidence management
  • Risk management, exception handling, and control-based thinking
  • Experience integrating DR with ITIL processes (Incident/Problem/Change/Release)

Tools (examples)

  • ITSM tools (ServiceNow/Jira), CMDB, monitoring tools
  • Backup and replication tools (e.g., Veeam/NetBackup/Commvault/Zerto), cloud DR services (AWS/Azure)
  • Documentation repositories (SharePoint/Confluence), dashboarding (Power BI)

7) Experience & Qualifications

  • 7–12+ years in IT operations / infrastructure / service continuity / resilience, with 3–5+ years in DR governance or resilience leadership.
  • Proven track record in:
  • governing DR programs across multi-tower teams (cloud + on‑prem + apps)
  • leading DR tests and driving closure of findings
  • preparing audit-ready evidence and handling internal/external audits

Preferred Certifications (any combination)

  • ITIL (Foundation/Intermediate), ISO 22301/BCM exposure
  • ISO 27001 / security governance awareness
  • Cloud certifications (AWS/Azure), DR/BCP certifications (DRI/BCI) – optional but valuable

8) Behavioral Competencies

  • High ownership and persistence (drives closure)
  • Structured communicator (can write concise exec summaries + detailed runbooks)
  • Comfortable challenging risk acceptance and ensuring accountability
  • Calm under pressure (major incident / DR invocation scenarios)

9) Working Conditions / On-Call

  • Primarily business hours; on-call / extended hours during DR tests, major incidents, or DR invocation events.
  • May require coordination across time zones (onshore/offshore model).

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146435965

Similar Jobs