Define, maintain and version data standards, definitions and business glossary for in‑scope domains; deliver a single-source-of-truth catalogue entry for each CDE.
Document Critical Data Elements (CDEs) with classification (sensitivity/confidentiality), lineage, acceptable values, usage rules and retention.
Deliverable: CDEs documented for X domains within Y weeks (to be agreed at onboarding).
Design, implement and maintain data quality rules, scorecards and a versioned rules catalogue for critical data elements (DCEs) in alignment with governance policies.
Perform data profiling, statistical analysis and anomaly detection to surface and prioritize issues; perform root‑cause analysis and coordinate remediation.
Embed automated checks and validations into ETL/ELT pipelines (Databricks/Spark or equivalent) to enable continuous testing and end‑to‑end lineage of quality metrics.
Facilitate the data ownership/stewardship model. Establish and publish roles, responsibilities, decision rights, and escalation paths. Deliverable: steward RACI and escalation matrix within first 6 weeks.
Translate governance policies into technical and operational rules together with Data Engineers (naming conventions, modelling guidelines, access controls).
Deliverable: implementable rule set and sample automated checks for priority datasets within 3 months.
Manage the metadata repository/data catalogue: ensure discoverability, lineage, and business context for sources and transformations.
Configure, integrate and operate data quality tooling (e.g., Great Expectations, Deequ, Databricks native checks, Microsoft Purview, or approved DQ tools), including dashboards, alerts and SLA monitoring.
Triage and coordinate remediation of data issues: maintain issue logs, lead RCA workshops, assign actions, and track SLAs to closure. Deliverable: weekly exception reports and SLA dashboard.
Define and monitor data governance and quality KPIs (completeness, accuracy, timeliness, uniqueness, policy adherence); agree baseline targets during onboarding and provide monthly dashboards.
Define and socialize target thresholds, exception handling and remediation workflows; own the exceptions queue until closure and verify fixes.
Apply CI/CD best practices for quality rules and tests; maintain automation to prevent regressions.
Support data access governance and privacy: participate in access reviews, assist
Legal/Compliance on regulatory requests and ensure controls meet policy (GDPR/CCPA as applicable).
Create and run training, playbooks, onboarding kits for stewards and data consumers; recommend and help implement automation (catalogue integrations, alerting).
Act as liaison across business, analytics, engineering and compliance to balancerisk mitigation and business enablement.
Work with procurement and vendor teams on tool selection, SOW review and vendor onboarding when external solutions or services are proposed.
Evangelize DQ best practices, contribute to governance standards, and provide training/handovers to data stewards.
Profile
Bachelor's degree in Information Systems, Computer Science, Data Management,or related field (or equivalent experience).
3–5+ years in data governance, data management, data quality or related roles.
Practical experience with metadata/catalogue tools and data-quality frameworks.
Strong SQL skills and proficiency in one or more programming languages used for data work (Python preferred; Scala/Java acceptable).
Familiarity with privacy/regulatory frameworks (e.g., GDPR, CCPA) and data access controls.
Strong stakeholder management, facilitation, and written/verbal communication skills.
Ability to work with engineering teams to implement technical controls.
Hands‑on experience with Databricks/Spark or equivalent big data platforms.
Practical experience implementing DQ frameworks or tools (e.g., GreatExpectations, Deequ, Informatica DQ, Talend, or native Databricks checks)
Experience with data catalogue/lineage tools (e.g., Microsoft Purview, Alation)and familiarity with metadata management concepts.
Experience with cloud platforms (Azure, AWS or GCP) and storage formats (DeltaLake, Parquet).
Experience with observability/monitoring tools (Grafana, Datadog, Prometheus).
Certifications (Databricks, Azure Data Engineer, CDMP) or experience with data privacy and regulatory controls (GDPR/CCPA).
Prior experience working with third‑party vendors, drafting SOWs, or managing outsourced DQ implementations.
Familiarity with agile delivery, CI/CD tooling, and automated testing frameworks.
Fluent English and Mandarin to communicate with client teams based in China and Hong Kong.