Search by job, company or skills

Tech Mahindra Limited

DevOps Lead

8-12 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 28 days ago
  • Be among the first 40 applicants
Early Applicant
Quick Apply

Job Description

Job Title: Site Reliability Engineer (SRE) – CI/CD & Observability

Role Overview

We are seeking a highly motivated Site Reliability Engineer (SRE) responsible for ensuring the reliability, performance, and scalability of critical enterprise applications and infrastructure. The role focuses on managing CI/CD pipelines, proactive monitoring to maintain high service availability and operational excellence.

Key Responsibilities

CI/CD Pipeline Management

  • Design, implement, and maintain robust CI/CD pipelines to support automated build, test, and deployment processes.
  • Optimize pipeline performance, reliability, and security.
  • Integrate pipelines with version control systems, artifact repositories, and automated testing frameworks.
  • Support release management and continuous delivery practices.

Observability & Monitoring

  • Implement and manage application and infrastructure monitoring using Dynatrace.
  • Configure dashboards, alerts, and performance baselines to enable proactive issue detection.
  • Analyze system performance metrics, logs, and traces to identify optimization opportunities.
  • Drive observability best practices across applications and middleware layers.

AIOps & Incident Intelligence

  • Manage and configure BigPanda for event correlation, noise reduction, and incident prioritization.
  • Integrate monitoring tools with BigPanda to provide unified operational visibility.
  • Automate incident response workflows and improve Mean Time to Resolution (MTTR).

 

Reliability Engineering

  • Establish SRE practices including SLIs, SLOs, and error budgets.
  • Perform root cause analysis (RCA) for incidents and implement preventive measures.
  • Drive automation initiatives to reduce manual operational tasks.
  • Support capacity planning, resilience engineering, and high-availability architecture.

Collaboration

  • Work closely with DevOps, application teams, infrastructure teams, and ITSM teams.
  • Participate in incident response and on-call rotations.
  • Contribute to continuous improvement initiatives for platform reliability.

Required Skills

Technical Skills

  • Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, Azure DevOps, etc.)
  • Strong hands-on experience with Dynatrace monitoring and observability
  • Experience using BigPanda or other AIOps platforms
  • Experience with cloud environments (Azure / GCP)
  • Knowledge of container platforms (Docker, Kubernetes)
  • Strong scripting skills (Python, Bash, or PowerShell)

Platform & Infrastructure

  • Linux/Unix system administration
  • Experience with middleware and application servers
  • Familiarity with microservices architecture and distributed systems

Preferred Skills

  • Knowledge of SRE frameworks and reliability engineering practices
  • Experience with ITSM tools (ServiceNow / Remedy)
  • Experience with Infrastructure as Code (Terraform / ARM)
  • Understanding of logging platforms (Azure log analytics, App Insights and Monitor , Splunk)

 

More Info

Job Type:
Function:
Employment Type:

Job ID: 145346481

User Avatar
0 Active Jobs

Similar Jobs

Early Applicant
Early Applicant