Role: Lead Data Engineer
Location: Kuala Lumpur, Malaysia (WFO, 5 Days)
Company: Confidential
Payroll Company: IIT Matrix
Long-Term Contract with potential Extension
Job Description
We want a dedicated Platform Engineering Team comprising five members; please find the details below.
Senior Developers with 7-8 years of experience.
Outlined below are the key responsibilities and the required skills and experience for this team.
Key Responsibilities
- Design and develop scalable, high-performance data pipelines across Hadoop ecosystem components (Hive, Impala, Spark, Kafka, and Iceberg).
- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for both batch and real-time.
- Architect and deliver modern data platforms, including Lakehouse architecture, Data Mesh, Data Fabric, and domain-aligned data products.
- Develop full-stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React).
- Design and implement secure APIs and microservices to expose data assets and machine learning models to downstream systems and user interfaces. Collaborate closely with data scientists to operationalize machine learning models using Cloudera Machine Learning (CML).
- Implement enterprise-grade security and governance controls, including RBAC, LDAP, Kerberos, Apache Ranger, and row-level access control.
- Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
- Support sandbox and playpen environments for rapid prototyping, enabling users to build ML models, dashboards, and data pipelines.
- Explore open-source projects/frameworks and make them enterprise-ready frameworks.
- Design, build, and deploy GenAI, Agentic AI, and LLM-based applications to enable intelligent data exploration, summarization, and automation. - Good to have experience
Required Skills & Experience
- Data Engineering: Strong hands-on experience with the Hadoop ecosystem (Hive, Impala, Spark, Kafka, Iceberg, Ranger, Atlas, Nifi, Flink etc.,), and data pipeline orchestration.
- Full-Stack Development: Proficiency in Java, Python, shell scripting, RESTful API development, and web frameworks such as Flask and React.
- Machine Learning & AI: Experience working with ML platforms such as CML, Spark MLlib, and Python ML libraries (scikit-learn, XGBoost), including model deployment.
- Security & Governance: Solid understanding of enterprise data security frameworks, including LDAP, Kerberos, RBAC, data masking, and fine-grained access control.
- Performance Optimization: Demonstrated ability to tune and optimize data queries and applications in large-scale Hadoop environments.
- Tools & Platforms: Experience with Cloudera Data Platform (CDP), Cloudera Data Services (CDS, CML, Informatica, Qlik Sense, Apache Oozie, Git, and CI/CD pipelines.
- Soft Skills: Strong analytical and problem-solving skills, effective communication, and the ability to collaborate across cross-functional teams.
- GenAI / Agentic AI Applications: Exposure to building enterprise-grade applications using large language models and frameworks (e.g., Hugging Face, LangChain). - Good to have experience