Databricks Data Engineer

Avanade

Malaysia, Kuala Lumpur

5-7 Years

This job is no longer accepting applications

Posted 2 months ago

Job Description

Summary

Our talented Data & AI Practice is made up of globally recognized experts - and there's room for more analytical and ambitious data professionals. If you're passionate about helping clients make better data-driven decisions to tackle their most complex business issues, let's talk. Take your skills to a new level and launch a career where you can truly do what matters.

Key Responsibilities

Data Pipeline Development: Design, build, and maintain robust and scalable ETL/ELT pipelines using Databricks, PySpark/Scala, and SQL to ingest, transform, and load data from diverse sources (e.g., databases, APIs, streaming services) into Delta Lake.
Databricks Ecosystem Utilization: Utilize core Databricks features such as Delta Lake, Databricks Workflows (or Jobs), Databricks SQL, and Unity Catalog for pipeline orchestration, data management, and governance.
Performance Optimization: Tune and optimize Spark jobs and Databricks clusters for maximum efficiency, performance, and cost-effectiveness.
Data Quality and Governance: Implement data quality checks, validation rules, and observability frameworks. Adhere to data governance policies and leverage Unity Catalog for fine-grained access control.
Collaboration: Work closely with Data Scientists, Data Analysts, and business stakeholders to translate data requirements into technical solutions and ensure data is structured to support analytics and machine learning use cases.
Automation & DevOps: Implement CI/CD and DataOps principles for automated deployment, testing, and monitoring of data solutions.
Documentation: Create and maintain technical documentation for data pipelines, data models, and processes.
Troubleshooting: Monitor production pipelines, troubleshoot complex issues, and perform root cause analysis to ensure system reliability and stability.

Qualifications

Skills and experiences:

5+ years of hands-on experience in Data Engineering.
3+ years of dedicated experience building solutions on the Databricks Lakehouse Platform.
Expert proficiency in Python (PySpark) and SQL for data manipulation and transformation.
In-depth knowledge of Apache Spark and distributed computing principles.
Experience with Delta Lake and Lakehouse architecture.
Strong understanding of ETL/ELT processes, data warehousing, and data modeling concepts.
Familiarity with at least one major cloud platform (AWS, Azure, or GCP) and its relevant data services.
Experience with Databricks features like Delta Live Tables (DLT), Databricks Workflows, and Unity Catalog.
Experience with streaming technologies (e.g., Kafka, Spark Streaming).
Familiarity with CI/CD tools and Infrastructure-as-Code (e.g., Terraform, Databricks Asset Bundles).
Databricks Certified Data Engineer Associate or Professional certification.