Job Title: Site Reliability Engineer (SRE) – CI/CD & Observability
Role Overview
We are seeking a highly motivated Site Reliability Engineer (SRE) responsible for ensuring the reliability, performance, and scalability of critical enterprise applications and infrastructure. The role focuses on managing CI/CD pipelines, proactive monitoring to maintain high service availability and operational excellence.

Key Responsibilities
CI/CD Pipeline Management
- Design, implement, and maintain robust CI/CD pipelines to support automated build, test, and deployment processes.
- Optimize pipeline performance, reliability, and security.
- Integrate pipelines with version control systems, artifact repositories, and automated testing frameworks.
- Support release management and continuous delivery practices.
Observability & Monitoring
- Implement and manage application and infrastructure monitoring using Dynatrace.
- Configure dashboards, alerts, and performance baselines to enable proactive issue detection.
- Analyze system performance metrics, logs, and traces to identify optimization opportunities.
- Drive observability best practices across applications and middleware layers.
AIOps & Incident Intelligence
- Manage and configure BigPanda for event correlation, noise reduction, and incident prioritization.
- Integrate monitoring tools with BigPanda to provide unified operational visibility.
- Automate incident response workflows and improve Mean Time to Resolution (MTTR).
Reliability Engineering
- Establish SRE practices including SLIs, SLOs, and error budgets.
- Perform root cause analysis (RCA) for incidents and implement preventive measures.
- Drive automation initiatives to reduce manual operational tasks.
- Support capacity planning, resilience engineering, and high-availability architecture.
Collaboration
- Work closely with DevOps, application teams, infrastructure teams, and ITSM teams.
- Participate in incident response and on-call rotations.
- Contribute to continuous improvement initiatives for platform reliability.

Required Skills
Technical Skills
- Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, Azure DevOps, etc.)
- Strong hands-on experience with Dynatrace monitoring and observability
- Experience using BigPanda or other AIOps platforms
- Experience with cloud environments (Azure / GCP)
- Knowledge of container platforms (Docker, Kubernetes)
- Strong scripting skills (Python, Bash, or PowerShell)
Platform & Infrastructure
- Linux/Unix system administration
- Experience with middleware and application servers
- Familiarity with microservices architecture and distributed systems

Preferred Skills
- Knowledge of SRE frameworks and reliability engineering practices
- Experience with ITSM tools (ServiceNow / Remedy)
- Experience with Infrastructure as Code (Terraform / ARM)
- Understanding of logging platforms (Azure log analytics, App Insights and Monitor , Splunk)