Site Rehability Engineer

2-8 years
2 months ago 12 Applied
Job Description

Job Description
Must Have:
Experience in observability, capacity planning, issue analysis and troubleshooting large-scale massively distributed, fault-tolerant systems running cloud native applications in micro services architecture;
Ability to debug scripts and automate routine tasks in OS, network, database or application servers. Coding experience beyond simple scripts;
Experience in Devops process, programming knowledge in at least one of the following languages: Java, Python, or Go;
Scripting skills in at least of the following: Shell, Terraform, Ansible, Chef or Puppet;
Deep Understanding of Unix/Linux operating systems, virtual machines, containers, Container management systems, Enterprise cloud platforms and data structures;
Engage in and improve the lifecycle of servicesfrom Launch through to deployment, operation and optimization in reliability and user experience;
Ensure service reliability once they are live by measuring and monitoring availability, latency, and overall system health. Practice sustainable incident response;
Gather and analyze metrics from tech stack to assist in performance tuning and fault handling of P0/P1/P2/P3 type of issues;
Participate in system design suggestions, platform management, Balance feature development speed and reliability with well-defined service level objectives;
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.

Good to Have:
Automation framework design using popular frameworks like salt stack, spinnaker, stack storm;
Experience on managing large scale Big data clusters;
Experience in Chaos engineering design and implementation;
Experience in large scale container management platform with auto scaling and intelligent scheduling;
Experience in Data analysis or data science or Data development of PB data;
Experience of SIEM, threat modelling, vulnerability detection design, deployment and optimization;
Experience in cloud services network design, rules/policy creation, deployment, performance tuning;
Experience in DB consistency detection, slow query tooling, Performance tuning of middleware including RDBMS, NoSQL, distributed caches.

Professional Knowledge Requirements
Bachelor degree or above in Computer science/Electronics & communication;
Have in-depth knowledge of SRE role and Devops process;
Have strong observation and critical thinking to handle business emergencies;
Ability to adapt to dynamic environment and apply problem solving skills to resolve issues;
Have excellent written and verbal communication skills;
Deep exposure to data analysis-based decision making scenarios;
Established record of continuous learning and upskilling tracking industry trends.

As software is becoming ubiquitous in every device we use, defects are no longer a problem to be managed. They have to be predicted and excised. At Mindteck, quality is beyond delivering error free software. We view our processes and methodologies as an inherent feature enabling us to exceed customer expectations. Quality is a way of life at Mindteck covering all processes, interactions and deliverables.

At Mindteck, we benchmark our quality processes against international standards like ISO and CMMI. We are ISO 9001:2000 and ISO 27001:2005 certified and an SEI CMMI level 5 Certified company. Besides these certifications, Mindteck is also ISO 13485:2003 certified to serve the Medical Electronics Industry.

Our quality policy states - 'We shall strive to satisfy our customers by consistently ensuring cost effective and timely delivery of high quality software solutions.'

Career Advice to Find Better