Summary
Our talented Data & AI Practice is made up of globally recognized experts - and there's room for more analytical and ambitious data professionals. If you're passionate about helping clients make better data-driven decisions to tackle their most complex business issues, let's talk. Take your skills to a new level and launch a career where you can truly do what matters.
Key Responsibilities
- Data Pipeline Development: Design, build, and maintain robust and scalable ETL/ELT pipelines using Databricks, PySpark/Scala, and SQL to ingest, transform, and load data from diverse sources (e.g., databases, APIs, streaming services) into Delta Lake.
- Databricks Ecosystem Utilization: Utilize core Databricks features such as Delta Lake, Databricks Workflows (or Jobs), Databricks SQL, and Unity Catalog for pipeline orchestration, data management, and governance.
- Performance Optimization: Tune and optimize Spark jobs and Databricks clusters for maximum efficiency, performance, and cost-effectiveness.
- Data Quality and Governance: Implement data quality checks, validation rules, and observability frameworks. Adhere to data governance policies and leverage Unity Catalog for fine-grained access control.
- Collaboration: Work closely with Data Scientists, Data Analysts, and business stakeholders to translate data requirements into technical solutions and ensure data is structured to support analytics and machine learning use cases.
- Automation & DevOps: Implement CI/CD and DataOps principles for automated deployment, testing, and monitoring of data solutions.
- Documentation: Create and maintain technical documentation for data pipelines, data models, and processes.
- Troubleshooting: Monitor production pipelines, troubleshoot complex issues, and perform root cause analysis to ensure system reliability and stability.
Qualifications
Skills and experiences:
- 5+ years of hands-on experience in Data Engineering.
- 3+ years of dedicated experience building solutions on the Databricks Lakehouse Platform.
- Expert proficiency in Python (PySpark) and SQL for data manipulation and transformation.
- In-depth knowledge of Apache Spark and distributed computing principles.
- Experience with Delta Lake and Lakehouse architecture.
- Strong understanding of ETL/ELT processes, data warehousing, and data modeling concepts.
- Familiarity with at least one major cloud platform (AWS, Azure, or GCP) and its relevant data services.
- Experience with Databricks features like Delta Live Tables (DLT), Databricks Workflows, and Unity Catalog.
- Experience with streaming technologies (e.g., Kafka, Spark Streaming).
- Familiarity with CI/CD tools and Infrastructure-as-Code (e.g., Terraform, Databricks Asset Bundles).
- Databricks Certified Data Engineer Associate or Professional certification.