Kuala Lumpur, Malaysia
About Horizontal: Established since 2003 in the US, Horizontal solves complex challenges across two distinct businesses: Horizontal Digital and Horizontal Talent. We are consistently recognized for being a top workplace and one of the fastest growing private companies. Horizontal Talent specializes in staffing for IT, Digital & Creative and Business & Strategy markets. We have global offices in US, UAE, India, Malaysia and Australia.
About the role: We are seeking an experienced and visionary
Engineering Manager Observability to lead the development and management of a highly scalable, robust observability ecosystem. This role blends software engineering, systems thinking, and leadership to ensure critical data-intensive, distributed systems are observable, reliable, and continuously improving.
You will be responsible for building and scaling observability platforms that deliver real-time insights into system health, performance, and security. This includes logging, monitoring, tracing, alerting, and data visualization capabilities that empower teams to detect and resolve issues proactively.
Youll collaborate cross-functionally with engineering, SRE, product, platform, and support teams to elevate system reliability and accelerate innovation across the organization&aposs infrastructure landscape.
Responsibilities
Team Leadership & Mentorship
- Recruit and develop a high-performing, diverse engineering team.
- Provide coaching, technical guidance, and career development to team members.
- Foster an inclusive, collaborative, and growth-oriented team culture.
Technical Strategy & Delivery
- Lead architecture design and implementation of observability platforms supporting high-throughput, distributed environments.
- Drive software engineering best practices across observability systems, ensuring scalability, performance, and security.
- Own the roadmap for observability tools including metrics, logs, traces, alerting, and telemetry pipelines.
Cross-Functional Collaboration
- Partner with Product, SRE, Development, and Platform teams to align on observability needs and integrate solutions into existing workflows.
- Act as a subject matter expert in observability across the organization.
Innovation & Continuous Improvement
- Stay ahead of industry trends in observability, monitoring, and reliability engineering.
- Promote continuous learning, knowledge sharing, and adoption of modern practices such as Open Telemetry and AI-driven observability.
Operational Excellence
- Lead initiatives for incident detection, root cause analysis, and system health monitoring.
- Drive automation for observability infrastructure and configuration management using Infrastructure as Code (IaC) tools.
- Maintain compliance and reliability in high-availability, multi-tenant environments.
Performance & Process Management
- Define team objectives and OKRs.
- Conduct performance evaluations and provide structured feedback to support career growth.
Requirement
Professional Skills
- Bachelor&aposs or Master&aposs degree in Computer Science, Engineering, or related fields.
- Strong leadership, interpersonal, and stakeholder management skills.
- Passion for building inclusive teams and developing engineers.
- Excellent communication skills, both written and verbal.
- Customer-focused mindset with a strong emphasis on quality and reliability.
- Working knowledge of Agile, DevOps, and continuous delivery practices.
Key Qualifications
- 8+ years of software engineering experience, ideally working with distributed systems.
- 5+ years of engineering leadership experience, including managing observability or platform teams.
- Deep expertise in observability technologies such as:
- Elastic Stack (Elasticsearch, Logstash, Kibana)
- Prometheus, Grafana, OpenTelemetry, or similar.
- Experience with real-time data pipelines and streaming platforms (e.g., Apache Kafka).
- Proficiency in cloud platforms (AWS, GCP, or Azure) and containerization (Docker, Kubernetes).
- Familiarity with automation and IaC tools (Terraform, Ansible, CI/CD pipelines).
- Strong debugging and troubleshooting skills across the full software stack.
Preferred Qualifications
- Experience implementing AI-driven observability or anomaly detection.
- Knowledge of large-scale, high-volume real-time systems.
- Exposure to regulated industries such as fintech or financial services.
- Experience running observability as a managed service within a multi-tenant environment.
The above description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.