Operation & Maintenance Service Engineer

3-5 Years

MYR 7,000 - 10,000 per month

Save

Early Applicant

Job Description

Major Fault Support & Review - Respond to Severity 1/2 incidents within 1 hour, log cases to Huawei TAC, lead troubleshooting, and conduct post event fault reviews.
Operational Analysis - Deliver monthly reports covering cloud resource capacity, platform health, alarm analysis, and optimization recommendations.
Risk Check - Perform quarterly in depth health checks (runtime status, capacity, configuration, warnings, version & license management) and provide rectification guidance.
Basic & Advanced Cloud Service Version Upgrade - Lead upgrade solution design, pre upgrade checks, implementation, verification, and rollback for cloud platform and gPaaS/AI DaaS services
Issue & Risk Management - Analyse issue trends, track unresolved escalated issues, and proactively identify and mitigate platform stability risks.
Resource & Capacity Management - Analyse resource usage and provide scaling or reconstruction recommendations.
Change Implementation - Implement configuration changes (including after hours support up to 8 hours/week as needed), obtain customer authorization, and perform rollback if required.
Urgent Recovery & Troubleshooting - Assist TAC in emergency fault recovery, common issue resolution, and rapid service restoration.
Routine PMI & Monitoring - Perform routine product inspections, monitor alarms via ManageOne/eSight, and ensure platform health

Bachelor/Master of computer science engineering or related majors, with over 3 years O&M experience, including hands on work with cloud platforms (public/private cloud) or general IT (networks, OS, databases, middleware, basic IT components).
Familiar with HCS deployment, tenant O&M processes (backup management, resource inspection, requirement management, risk management, asset management, expense analysis, monitoring & alarming.
Strong foundational knowledge of datacom principles familiar with TCP/IP, standard IP networking, and hybrid cloud networking able to independently resolve basic datacom issues proficient in physical network technologies and architectures.
Strong capability to identify and demarcate cloud problems, lead problem closure, and drive backend improvements.
Excellent customer service awareness and communication skills able to work with multinational teams
Attention to detail and strong execution capabilities

Preferred Qualifications:

Experience maintaining or optimising large scale data centre network services.
Experience with SDN delivery and maintenance.
Familiarity with chaos engineering, fault drills, stress tests, and architecture optimisation.
Experience in multinational team management (at least 2 years).