About the Job
The Senior DevOps Engineer is responsible for building and operating a zero-latency, real-time transaction platform and odds engine. You will join the first core team to kickstart a development hub dedicated to next-gen entertainment and SaaS platforms. In this role, you will design and scale systems that handle massive concurrent loads with precision, speed, and resilience.
Key Responsibilities
- Infrastructure & IaC: Design, deploy, and maintain infrastructure on Google Cloud Platform (GCP) using Terraform and GKE.
- Latency Optimization: Develop and optimize systems to achieve ultra-low latency and real-time transaction processing.
- System Resilience: Implement autoscaling, high availability, and fault-tolerance for backend and odds engine systems.
- Deployment Pipelines: Set up robust CI/CD pipelines for automated deployment and delivery.
- Observability: Build deep observability (o11y) stacks for performance monitoring, tracing, and alerting.
- Cost Management: Act as a cost guardian by optimizing cloud resource usage and infrastructure spend.
- Cross-functional Collaboration: Work with backend and platform teams to ensure reliable and secure deployments.
- Incident Response: Drive incident response, post-mortems, and performance tuning for production systems.
- Tooling: Contribute to internal tools and automation using Go and shell scripting.
Requirements
- Technical Skills: Strong proficiency in Go (Golang) with hands-on DevOps implementation experience.
- Cloud Platform: Proven experience with GCP, especially GKE, Cloud Run, Pub/Sub, and Cloud SQL.
- Automation Tools: Expertise in Terraform, Infrastructure-as-Code principles, and CI/CD systems such as GitLab CI, ArgoCD, or Jenkins.
- Orchestration: Deep understanding of Kubernetes, container orchestration, and autoscaling strategies.
- Performance Standards: A proven record of optimizing performance and achieving near-zero latency in production environments.
- System Troubleshooting: Ability to troubleshoot distributed systems and lead technical incident resolution.
Advantages
- Industry Experience: Prior experience in high-concurrency, real-time, or transactional systems, with a preference for gaming, trading, or fintech backgrounds.
- Monitoring Expertise: Familiarity with o11y stacks including Prometheus, Grafana, or OpenTelemetry.
- Financial Oversight: Familiarity with cost optimization and GCP billing management.