Search by job, company or skills

  • Posted 18 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

This role is responsible for reliability, availability, user experience, capacity planning, AIOps, process enhancement and digitalization of the cloud-based internet services.

Main responsibilities:

  • Handle SRE role for assigned cloud services owning the KPIs for service reliability, issue to resolution, service deployment, business continuity management, security policy planning, capacity planning, Automation ,etc.
  • Automation:Automate routine and manual operations tasks to reduce toil and improve efficiency.
  • Monitoring & Alerting:Implement and use monitoring systems to track system health, set up alerting, and create dashboards.
  • Incident Management:Respond to and manage incidents to minimize downtime and resolve issues quickly, including on-call support.
  • System Performance:Measure, analyze, and tune system performance to ensure efficiency and stability.
  • Infrastructure Management:Provision and manage cloud infrastructure, sometimes using Infrastructure as Code (IaC), and assist in platform management and capacity planning.
  • Reliability & Resilience:Build sustainable and reliable systems through software engineering practices, which can include resilience testing and chaos engineering.

Requirements:

  • Full-time bachelor Bachelor's degree or above (or equivalent) in computer science or related discipline.
  • Be familiar with Linux, Network, Database. Ability to program using one or more high-level languages, such as Python, Java, C/C++, and JavaScript.
  • Be familiar with containerization technologies like Docker and orchestration tools like Kubernetes.
  • Be familiar with configuration management and automation tools such as Ansible and Terraform.
  • Be familiar with monitoring, logging, and alerting tools like Splunk, Grafana, or Prometheus.
  • Have good language communication skills, contingency skills, organization and coordination skills. And Strong analytical and troubleshooting skills for complex systems.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 133677627

Similar Jobs