Search by job, company or skills

Tap Growth ai

Lead Data Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 14 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role: Lead Data Engineer
Location: Kuala Lumpur, Malaysia (WFO, 5 Days)
Company: Confidential
Payroll Company: IIT Matrix

CONTRACT Role

Eligible: Malaysian or Permanent Residents only

Job Description

We want a dedicated Platform Engineering Team comprising five members; please find the details below.
Senior Developers with 7 - 8 years of experience.
Outlined below are the key responsibilities and the required skills and experience for this team.

Key Responsibilities

  • Design and develop scalable, high-performance data pipelines across Hadoop ecosystem components (Hive, Impala, Spark, Kafka, and Iceberg).
  • Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for both batch and real-time. Architect and deliver modern data platforms including Lakehouse architecture, Data Mesh, Data Fabric, and domain-aligned data products
  • Develop full-stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React).
  • Design and implement secure APIs and microservices to expose data assets and machine learning models to downstream systems and user interfaces.
  • Collaborate closely with data scientists to operationalize machine learning models using Cloudera Machine Learning (CML).
    Implement enterprise-grade security and governance controls, including RBAC, LDAP, Kerberos, Apache Ranger, and row-level access control.
  • Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
  • Support sandbox and playpen environments for rapid prototyping, enabling users to build ML models, dashboards, and data pipelines.
  • Explore open-source projects/frameworks and make them enterprise-ready frameworks.
  • Design, build, and deploy GenAI, Agentic AI, and LLM-based applications to enable intelligent data exploration, summarization, and automation. - Good to have experience

Required Skills & Experience

  • Data Engineering: Strong hands-on experience with the Hadoop ecosystem (Hive, Impala, Spark, Kafka, Iceberg, Ranger, Atlas, Nifi, Flink etc.,), and data pipeline orchestration.
  • Full-Stack Development: Proficiency in Java, Python, shell scripting, RESTful API development, and web frameworks such as Flask and React.
  • Machine Learning & AI: Experience working with ML platforms such as CML, Spark MLlib, and Python ML libraries (scikit-learn, XGBoost), including model deployment.
  • Security & Governance: Solid understanding of enterprise data security frameworks, including LDAP, Kerberos, RBAC, data masking, and fine-grained access control.
  • Performance Optimization: Demonstrated ability to tune and optimize data queries and applications in large-scale Hadoop environments.
  • Tools & Platforms: Experience with Cloudera Data Platform (CDP), Cloudera Data Services (CDS, CML, Informatica, Qlik Sense, Apache Oozie, Git, and CI/CD pipelines.
  • Soft Skills: Strong analytical and problem-solving skills, effective communication, and the ability to collaborate across cross-functional teams.
  • GenAI/Agentic AI Applications: Exposure to building enterprise-grade applications using large language models and frameworks (e.g., Hugging Face, LangChain). - Good to have experience

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144064489

Similar Jobs