JOB SUMMARY
We are seeking a highly experienced, technically deep, and governance-driven Senior Infrastructure Manager to take full ownership of our enterprise ICT infrastructure operations. The successful candidate will bring a proven track record managing large-scale, heterogeneous server environments of 150+ servers spanning multiple operating system platforms -- including IBM AIX, Red Hat / Ubuntu Linux, and Windows Server -- within a structured, compliance-driven organisation.
This is not an entry-level leadership role. We require a seasoned professional who has personally designed, implemented, and enforced rigorous infrastructure governance frameworks -- including change control gates, configuration management, capacity governance, and lifecycle policies -- at scale. You will be expected to bring institutional knowledge of multi-platform server ecosystems, and the gravitas to drive operational excellence across a complex, mission-critical environment.
KEY RESPONSIBILITIES
1. Physical Server Management
- Manage the end-to-end hardware lifecycle for physical servers across a fleet of 150+ servers -- including procurement, rack & stack installation, maintenance, and formal decommissioning procedures.
- Oversee data centre operations ensuring optimal environmental conditions (power redundancy, precision cooling, hot/cold aisle containment, structured cabling).
- Maintain a live, accurate Configuration Management Database (CMDB) reflecting all physical assets, relationships, lifecycle status, and configuration baselines.
- Conduct rigorous capacity planning, performance baselining, and proactive hardware refresh forecasting.
- Manage OEM vendor relationships (Dell, HPE, IBM) for hardware support contracts, warranty renewals, and firmware / BIOS / UEFI / iDRAC / iLO management.
- Enforce physical security controls, asset tagging standards, and audit-readiness across all data centre hardware.
2.Virtualisation Management
- Lead the administration of enterprise hypervisor platforms -- VMware vSphere / ESXi, Microsoft Hyper-V, and IBM PowerVM / VIOS.
- Govern VM provisioning workflows including resource allocation (CPU, RAM, storage), golden template management, and automated deployment pipelines.
- Manage snapshot governance policies, storage vMotion operations, DRS rules, and cluster High Availability (HA) / Fault Tolerance (FT) configurations.
- Enforce hypervisor-level security hardening (STIG compliance, vSphere Security Configuration Guide), patch management schedules, and host configuration baselines.
- Oversee virtualisation capacity planning -- ensuring compute and storage pools are right-sized, with defined thresholds triggering formal capacity review gates.
- Manage integration between virtualisation layers and underlying SAN / NAS storage (Fibre Channel, iSCSI, NFS datastores).
3. ICT Operations Administration -- Multi-Platform Server Ecosystems
- Administer and govern a heterogeneous server environment spanning IBM AIX (LPAR / VIOS / HMC), Red Hat Enterprise Linux (RHEL) / Ubuntu Linux, and Windows Server (2016 -- 2022).
- Manage Active Directory, DNS, DHCP, Group Policy, and identity federation across the Windows estate.
- Manage LPAR configuration, PowerVM, HMC administration, and AIX OS patching via NIM (Network Installation Manager).
- Oversee kernel patching, YUM / RPM package management, SELinux policies, systemd services, and cron-based automation (Linux).
- Implement and maintain enterprise monitoring platforms (SolarWinds, Nagios, IBM Tivoli, Zabbix) covering availability, performance thresholds, and capacity trends across all 150+ servers.
- Lead Incident Management for P1 / P2 infrastructure events -- owning the full lifecycle from detection through containment, resolution, and formal Root Cause Analysis (RCA) documentation.
- Architect, test, and maintain Disaster Recovery (DR) and Business Continuity Plans (BCP), including regular failover drills and DR runbook maintenance.
- Oversee backup and recovery operations (Veeam, IBM Spectrum Protect / TSM, Commvault), ensuring RPO / RTO targets are met and backup integrity is regularly validated.
4.Infrastructure Governance, Gates & Compliance
- Design, implement, and maintain a formal Infrastructure Governance Framework encompassing Change Advisory Board (CAB) processes, approval gates, RFC workflows, and post-implementation reviews.
- Enforce strict change control gates for all infrastructure modifications -- including pre-change risk assessments, implementation plans, rollback procedures, and documented sign-off authorities.
- Own and maintain Configuration Management (CMDB) discipline, ensuring all 150+ servers are accurately documented with configuration baselines, change history, and dependency mappings.
- Drive Patch Governance across all platforms (AIX, Linux, Windows) -- maintaining patch currency schedules, exception registers, and documented risk acceptances for any deferred patches.
- Lead infrastructure audit readiness -- producing evidence packs, compliance dashboards, and policy adherence reports for internal audit, ISO 27001, and regulatory reviews.
- Establish and maintain Standard Operating Procedures (SOPs), Runbooks, and Knowledge Base articles for all recurring operational activities across the server estate.
- Govern Privileged Access Management (PAM) -- ensuring least-privilege principles, regular access reviews, and segregation of duties across all server platforms.
5. Team Leadership & Operational Governance
- Lead and coordinate a cross-functional team of System Administrators and Infrastructure Engineers across AIX, Linux, and Windows disciplines.
- Chair regular Infrastructure Operations Review meetings -- tracking open incidents, the change pipeline, capacity alerts, and overall compliance posture.
- Report infrastructure health, risk posture, and governance metrics to senior leadership and the CIO via structured monthly operational dashboards and executive summaries.
- Ensure the team operates within defined ITIL-aligned processes for Incident, Problem, Change, and Configuration Management.
- Drive a culture of documentation-first operations -- ensuring no undocumented changes, no informal configurations, and full traceability of all actions taken across the server estate.
QUALIFICATIONS & EDUCATION
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field -- or equivalent demonstrated operational experience at scale.
- Advanced knowledge of Data Centre Infrastructure Management (DCIM) principles and tooling.
- Formal training or academic grounding in ITIL Service Management or an equivalent IT governance framework.
- Proven track record managing enterprise-grade, multi-platform server environments at scale (150+ servers).
EXPERIENCE REQUIREMENTS
- 10+ years of hands-on experience in IT Infrastructure, with at least 4 years in a senior managerial or lead capacity.
- Demonstrated experience managing a server estate of 150 or more servers in a single, complex enterprise environment.
- Hands-on production administration across at least two of: IBM AIX, Red Hat / Ubuntu Linux, and Windows Server.
- Proven experience implementing and enforcing formal change control gates and infrastructure governance frameworks.
- Extensive experience with enterprise server hardware: Dell PowerEdge, HPE ProLiant, and IBM Power Systems.
- Deep experience with SAN / NAS platforms (NetApp, Dell EMC, IBM Storwize) is essential.
- Prior experience presenting infrastructure governance metrics and operational risk reports to C-level stakeholders.
TECHNICAL SKILLS & CERTIFICATIONS
Core Certifications(Highly Preferred)
- VMware VCP-DCV
- Microsoft MCSE / Azure Administrator
- CompTIA Server+
- ITIL v4 Foundation
- Cisco CCNA (Data Center)
- IBM AIX Administration
- Red Hat RHCSA / RHCE
- PRINCE2 / PMP (Advantageous)
Technical Stack Proficiency
- Server OS Platforms: IBM AIX (6.1 -- 7.x, LPAR / PowerVM / HMC), RHEL (7 / 8 / 9), Ubuntu LTS, Windows Server 2016 / 2019 / 2022.
- Virtualisation: VMware vSphere 7 / 8, vCenter, vSAN, Microsoft Hyper-V, IBM PowerVM / VIOS, Nutanix AHV.
- Hardware: Dell PowerEdge (blade & rack), HPE ProLiant / Synergy, IBM Power Systems (Power8 / 9 / 10), Cisco UCS.
- Storage: NetApp ONTAP, Dell EMC VNX / Unity / PowerStore, IBM Storwize / SVC -- FC, iSCSI, NFS.
- Monitoring & ITSM: SolarWinds, Nagios / Zabbix, IBM Tivoli, ServiceNow (Change / CMDB modules).
- Backup & Recovery: Veeam Backup & Replication, IBM Spectrum Protect (TSM), Commvault.
- Automation & Scripting:PowerShell, Bash / Shell scripting, Ansible (infrastructure automation).
- Governance Tools: CMDB platforms, CAB / RFC workflows, audit evidence management.
COMPETENCIES & SOFT SKILLS
- Problem Solving: Exceptional analytical and diagnostic skills for troubleshooting complex, cross-platform hardware / software issues across AIX, Linux, and Windows simultaneously.
- Governance Mindset: A natural inclination toward process rigour -- demonstrating an instinct for gates, controls, documentation, and auditability in every operational decision.
- Communication: Ability to translate complex, multi-platform technical issues into clear business language for senior stakeholders, audit teams, and non-technical executives.
- Crisis Management: Battle-tested composure and decisive leadership under pressure during P1 outages affecting large server fleets -- capable of coordinating multi-team responses across platforms.
- Strategic Thinking: Forward-looking approach to infrastructure evolution -- planning technology refresh cycles, platform consolidation, and hybrid cloud integration while managing day-to-day operations.
- Accountability: Takes full ownership of the server estate -- no ambiguity, no gaps, no undocumented configurations. Instils the same standard in the broader team.