Search by job, company or skills

I

Senior SRE Platform Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 14 hours ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

RoleOverview

WeareseekinganexperiencedSiteReliabilityEngineer(SRE)/PlatformEngineerwithstronghandsonexpertiseinreliabilityengineering,observability,incidentresponse,andAWScloudoperations.Theidealcandidatehas57+yearsofexperienceworkinginproduction gradeenvironments,withaproventrackrecordimplementingSLOs/SLIs,designingresilientplatforms,anddrivingoperationalexcellencethroughautomationandinfrastructureascode.

Thisisaseniortechnicalrolerequiringdeepengineeringfundamentals,ownershipmindset,andtheabilitytocollaborateacrossengineering,product,andsecurityteams.

KeyResponsibilities

ReliabilityEngineering

Define,implement,andmaintainSLOs,SLIs,andSLAsacrosscriticalservices.

Continuouslymeasureservicehealthandproactivelyimprovereliabilityandperformance.

Driveerrorbudgetpoliciesandreliabilitycenteredengineeringpractices.

Observability&Monitoring

DesignandimplementobservabilityframeworksusingGrafana(preferred),Prometheus,orequivalenttools.

Builddashboards,alerts,tracing,andlogaggregationpipelines.

Ensurefullvisibilityintosystemhealth,performance,andfailuremodes.

IncidentResponseManagement

Leadmajorincidentresponseandpostincidentanalysis.

Ownoncallprocesses,escalationworkflows,andresponserunbooks.

Driverootcauseanalysis(RCA),correctiveactions,andlongtermpreventionstrategies.

ContinuouslyreduceMTTRandimproveoperationalreadiness.

CloudEngineering(AWS)

Build,deploy,andmaintainplatformcomponentsusingAWSservicessuchas:

EC2,ECS,EKS,Lambda,DynamoDB,S3,IAM,APIGateway,CloudWatch,VPC.

Implementsecureandscalablecloudarchitecturesalignedwithbestpractices.

Optimizecost,performance,andoperationalefficiencyofcloudworkloads.

InfrastructureasCode(IaC)

ImplementandmaintainIaCusingTerraform,CloudFormation,orCDK.

DesignreusableIaCmodules,enforcestandards,andensureconsistentenvironmentprovisioning.

Automatecloudinfrastructuredeployment,configuration,andcompliance.

CI/CD&Automation

DesignandmaintainCI/CDpipelinesusingGitHubActions,GitLabCI,Jenkins,BitbucketPipelines,orAWSCodePipeline.

Championautomationpracticesacrossbuild,test,deployment,andoperationalprocesses.

Ensuresecure,reliable,andauditabledeploymentworkflows.

PlatformEngineering

Designandmaintaincontainerizedworkloads(Docker,ECS,EKS).

Manageserviceorchestration,runtimeenvironments,anddeploymentstrategies.

EvaluateandimplementmodernDevOps/SREtoolstoimprovedeveloperproductivityandplatformreliability.

Collaboration&Leadership

Workcloselywithengineering,product,andarchitectureteamstoembedSREbestpractices.

Contributetodocumentation,knowledgesharing,andcontinuousimprovementinitiatives.

Providetechnicalguidanceandmentorshiptojuniorteammembers.

RequiredSkills&Experience(Mandatory)

57+yearsofexperienceinSRE,DevOps,PlatformEngineering,orCloudInfrastructureroles.

Expertknowledgeof:

oSLO/SLI/SLAdesign&management

oObservability&Monitoring(Grafanarequired;Prometheus/ELKaplus)

oIncidentResponse&OnCallOperations

oAWSServices(handson,production-grade)

oInfrastructureasCode(Terraform,CloudFormation,orCDK)

oCI/CDpipelineengineering

Strongunderstandingofdistributedsystems,networking,containers,andcloud-nativearchitectures.

Abilitytotroubleshootcomplexproductionissuesacrossapplication,infrastructure,andnetworklayers.

Strongscriptingskills(Python,Bash,orGoareadvantages).

Excellentcommunication,analyticalthinking,andproblem-solvingskills.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Malaysian

About Company

Job ID: 145002721