
Search by job, company or skills
Role Purpose: As a member of the Migration Team, you will execute the technical
movement and transformation of legacy data from IBM Netezza and (Cloudera/Talend) to a
modernized CDP Private Cloud Base environment. Your focus is on re-engineering ETL logic
into Spark and ensuring data integrity across the Bronze, Silver, and Gold layers.
2. Business & Operational Responsibilities
∙Legacy Workload Modernization: Execute the migration of Netezza tables and legacy
Cloudera tables to the new EDM platform.
∙ETL Transformation: Implement the migration and optimization of rationalized
DataStage jobs into a Spark-native framework.
∙Regulatory Compliance: Develop pipelines that specifically enable BNM Project
STREAM reporting, ensuring that all data elements are conformed to the Common
Data Model.
∙Quality & Reconciliation: Perform source-to-target data parity checks, including hash
totals and row count validations, to maintain 100% data integrity.
∙Operational Readiness: Support 90-day parallel runs and hypercare activities to
ensure zero disruption to daily business operations during cutover.
3. Technical Requirements
A. Data Engineering & Migration Execution
∙Spark on YARN: Build and optimize high-throughput ETL/ELT pipelines using Spark 3,
moving away from legacy RDBMS-based processing.
∙Table Format Implementation: Modernize legacy table structures into Apache
Iceberg to support ACID-style consistency, time travel, and row-level updates.
∙Automated Conversion: Utilize LLM-based accelerators to parse DSX files, mapping
legacy operators to equivalent Spark tasks.
∙High-Performance Serving: Migrate critical data marts into Apache Kudu native
tables to provide low-latency analytics for QlikSense and SAS consumers.
B. Ingestion & Orchestration
∙NiFi Data Flow: Configure Apache NiFi flows to land data in the Lakehouse landing
zone first, eliminating uncontrolled point-to-point feeds.
∙Airflow Orchestration: Refactor ZENA workflows and shell scripts into Airflow DAGs,
establishing clear dependency management and SLA visibility.
∙CDC Integration: Implement Change Data Capture (CDC) logic to ensure incremental
data loads are synchronized across the modernized platform.
C. Governance & Quality Control
∙Data Quality (DQ) Gates: Implement Great Expectations (GX) validation checks
(completeness, null rates, distribution checks) at both ingestion and transformation
stages.
∙Secure Implementation: Ensure all developed pipelines adhere to Apache Ranger
RBAC/ABAC and Ranger KMS encryption standards.
4. Experience & Qualifications
Professional Background
∙Experience Level: Senior-level data engineering (minimum 3 years) experience,
specifically in performing data engineering development using Nifi and Airflow.
Experience in migrations from Netezza or legacy Cloudera/Hadoop environments is
an added advantage.
∙Industry Context: Experience in financial services, with an understanding of banking
Job ID: 147388105
We don’t charge any money for job offers