
Search by job, company or skills
OBJECTIVE & ROLE
Run the environments by monitoring availability and taking a holistic view of system health
Build software (tools) and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large distributed software applications
Lead and assist in implementing SRE best practices in the team and collaborations with other teams
Help to train and coach new members of team who are new to the technologies and SRE practices
Gather, analyze metrics from systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives
Continuously improve the solution monitoring solution of the operation
Continuously implement any automation which improve the operation and reliability of the operation
PROFILE
College degree or technical training in Computer Science, or equivalent combination of training, and/or experience
Professional experience: Prior experience in software development (minimum 3 years).
Technical skills:
Strong knowledge of Linux and VM.
Competent knowledge of at least a database
Ability to program with one or more high level languages, at least in Python
Experience with log analysis
Experience with root cause analysis
Knowledge and experience in Nagios and Splunk will be added advantage.
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Knowledge or experience in other related technologies (OpenShift, Kubernetes) are advantageous
Specific knowledge linked to the work: Mandatory:
Strong troubleshooting skills and demonstrated abilities to diagnose and analyze software issues
Experience managing 1st level support activities (Service Center, Support Line, etc.)
Experienced with quality and performance monitoring processes
Experience with Microsoft Office tools (Excel, Word, PowerPoint, Visio)
Job ID: 143293077