We're looking for a DevOps Engineer with 35 years of experience to help us build highly observable, resilient, and testable systems across our microservices ecosystem. This is a hands-on role for someone who enjoys working at the boundary of development and operations
You'll be diving into a system with hundreds of microservices, helping us identify weak spots, build self-healing mechanisms, and level up our observability and quality assurance efforts.
Responsibilities
- Manage our Kubernetes systems that hosts the Postgres Database
- Enhance our observability systems using tools like Grafana, and CloudWatch to enable real-time monitoring, alerting, and diagnosis
- Analyze our microservices landscape to identify and implement self-healing strategies
- Improve early vulnerability detection mechanisms by employing security gates in the code pipeline.
- Design and enforce robust health checks for services and background jobs
- Improve and manage logging, tracing, and alerting pipelines
- Build infrastructure using Terraform and deploy containerized services with Docker on AWS ECS
- Use core AWS services (ECS, EKS, EC2, IAM, S3, SQS, CloudTrail) to manage and scale cloud workloads
- Improve CI/CD pipelines to include observability hooks and automated test gates
- Contribute to NodeJS based backend application development
Qualifications
- Bachelor's Degree in Computer Science or a related field (or equivalent practical experience)
Certifications
- Certified Kubernetes Security Specialist (or similar) and/or
- AWS Solution Architect Professional and/or
- AWS DevOps Engineer Professional
Skills & Experience
- STRONG Kubernetes experience (beyond basic setup and management)
- 35 years in full-stack or backend engineering (Node.js, Express, React, TypeScript)
- Strong experience with AWS services: ECS, EKS, EC2, S3, IAM, CloudWatch, SQS, CloudTrail
- Solid knowledge of integration testing, health checks, and service readiness probes
- Proficient in building and using Grafana dashboards and integrating observability tools
- Hands-on with Terraform, Docker, and cloud-native infrastructure
Nice to Have
- Familiarity with OpenTelemetry or Prometheus
- Familiarity with building AI Infrastructure with observability
- Understanding of basic e-commerce concepts: products, orders, offers, categories
- Power user of AI tools like ClaudeCode, Cursor, Windsurf etc
Why Join Us
- Tackle the challenge of managing and stabilizing a complex system of 100+ microservices
- Own critical infrastructure and reliability features from observability to automated recovery
- Collaborate with an ambitious, remote-first team and help build production-grade platforms at scale