You'll work on migrating applications and data pipelines from a legacy Cloudera environment to a modern Kubernetes-based data platform. This role is hands-on and requires strong data engineering development skills across Spark, Python/Scala, and cloud-native pipeline orchestration.
Key Responsibilities
- Develop, optimize, and migrate data pipelines using Spark 3.5 and Python/Scala
- Convert existing Hive, Spark, and Control-M workflows to Airflow + DBT
- Integrate pipelines with messaging systems (Kafka, Solace) and object stores (S3, MinIO)
- Troubleshoot & tune distributed jobs running on Kubernetes
- Work closely with internal leads/architects to apply engineering best practices
- Build and enhance migration and automation frameworks to accelerate application migration
- Support migration tasks, testing, and production readiness activities
- Participate in daily progress discussions, reviews, and issue resolution
Required Experience
- 69 years of hands-on data engineering experience
- Strong knowledge of Apache Spark (batch + streaming) and Hive
- Proficiency in Python, Scala, or Java
- Experience with Airflow or Control-M, and SQL transformation frameworks (DBT preferred)
- Familiarity with Kafka, Solace, S3, MinIO
- Exposure to Docker / Kubernetes deployment workflows
- Experience with Lakehouse formats (Iceberg, Delta Lake, Hudi)
Engagement Expectations
Ability to work independently on assigned modules
Deliver high-quality, production-ready outcomes within timelines
Adhere to client processes, documentation, and compliance standards