Senior Data Engineer

Malaysia, Kuala Lumpur

6-8 Years

Save

Posted 8 days ago
Be among the first 10 applicants

Early Applicant

Job Description

Senior Data Engineer AI Data Foundations & Automation Enablement (BigQuery + MySQL, Automation-First, Retrieval-Aware)

Why this role exists

Our AI Innovation Lab delivers AI-driven workflows, agentic automations, and intelligent decisioning across the business. While the team is strong in workflow architecture, prompt engineering, and automation delivery (including n8n-based orchestration), we lack senior ownership of the data foundations that determine whether automations and AI agents behave correctly in production.

In our environment:

MySQL systems serve as operational systems of record
BigQuery is the primary AI, analytics, and automation-serving layer
Automation tools (e.g., n8n) execute workflows that depend on accurate, fresh, and well-governed data

This role owns the end-to-end data layer that feeds AI agents and automations, ensuring workflows act on the right data, at the right time, with clear guarantees around quality, freshness, and access.

What you'll do

Own AI- and Automation-Critical Data Foundations

Own production operation of AI- and automation-serving datasets in BigQuery.
Define and enforce SLAs/SLOs for data freshness, pipeline reliability and schema stability.
Build runbooks and participate in incident response for data-related failures.
Proactively prevent silent failures (partial loads, schema drift, stale data feeding automations).

Model Data for AI & Automation (Schemas + Metadata)

Design canonical schemas optimized for automation and agent consumption, including knowledge and reference data and product, policy entities and customer/account and workflow-related data.
Define metadata standards that support filtering and segmentation, permission and access enforcement and freshness and lifecycle awareness.
Traceability and explainability
Manage schema evolution using versioning and backward-compatibility patterns so workflows do not break unexpectedly.

Build Reliable Incremental Ingestion (MySQL BigQuery)

Design and operate incremental ingestion from MySQL (system of record) into BigQuery (serving layer).
Ensure correct handling of inserts, updates, deletes, schema changes and backfills and replays.
Align ingestion cadence and freshness guarantees to automation execution needs, not BI-only reporting schedules.

Data Quality, Lineage & Trust (AI-Grade DataOps)

Implement data quality gates for automation- and AI-critical datasets, including deduplication and survivorship rules, referential integrity and validation checks, and completeness and anomaly detection.
Track lineage from source systems transformations BigQuery datasets/views automation and AI consumption.
Surface clear data trust signals (freshness, source, confidence) for downstream workflows.

Enable Automation & Retrieval (n8n-Aware, Retrieval-Ready)

Design data outputs and interfaces that are automation-friendly: Predictable schemas, explicit null and edge-case handling and clear freshness and completeness guarantees.
Partner closely with Automation Specialists working in n8n to reduce complex data logic embedded inside workflows, move joins, filtering, and validation upstream into the data layer and ensure idempotency and safe retries in automation-driven execution patterns.
Structure data and metadata to support hybrid retrieval patterns (semantic/vector retrieval combined with structured filtering), where applicable.

(Note: You are not expected to build or own n8n workflowsbut your work directly enables their reliability.)

Governance, Privacy & Access Controls

Implement enterprise-grade governance in BigQuery, including IAM patterns and dataset/domain segmentation, row- and column-level access approaches where appropriate o Masking and consent/purpose enforcement.
Ensure PII and sensitive data are protected by design in pipelines and exposed interfaces.
Support security and compliance reviews with traceable, auditable controls.

Expose & Enable (Data Products & Contracts)

Publish stable, versioned data interfaces (tables, views, semantic layers, retrieval feeds) consumed by automation workflows (n8n), AI agents and services.
Maintain backward compatibility and clear deprecation paths.
Produce concise documentation that enables parallel delivery across teams.

Monitor, Optimize & Control Cost

Optimize BigQuery performance and spend using Partitioning and clustering, Query pattern optimization and Materialization strategies
Instrument dashboards and alerts for Pipeline reliability, Freshness SLAs, Query cost and performance, Access and policy violations.
Drive post-incident remediation and preventative improvements.

Skills & competencies we're after Required (Primary)

Strong experience with BigQuery in production (performance, cost, governance).
Strong experience ingesting from MySQL systems of record.
Proven experience building incremental / CDC-style pipelines.
Advanced SQL and strong data modeling skills.
Hands-on experience with data quality, lineage, and operational DataOps.
Experience designing data interfaces consumed by automation or integration platforms.
Clear communication and cross-functional collaboration skills.

Strongly Valued

Experience working with workflow automation platforms (e.g., n8n, iPaaS tools, event-driven orchestration).
Familiarity with retrieval or search systems (keyword, semantic, or hybrid).
Exposure to vector search concepts or systems (managed vector stores or Postgres/pgvector).
Experience supporting AI or RAG-style systems in production environments.

Minimum qualifications

6+ years in data engineering, analytics engineering, or platform data roles with production ownership.
Demonstrated ownership of end-to-end data pipelines and reliability.
Experience operating in enterprise environments with governance and compliance requirements.

Nice-to-haves

Experience with data observability tooling.
Experience in regulated or security-sensitive environments.
Familiarity with agentic AI concepts.
Experience defining data as products with contracts and SLAs.

What success looks like

Automation workflows consistently run against fresh, correct, and governed data.
Data-related automation and AI failures materially decrease.
New AI and automation use cases onboard faster due to reusable schemas and contracts.
Governance is enforced by design, accelerating security and compliance approvals.
BigQuery performance and cost remain predictable as usage scales.
Automation Specialists can focus on orchestrationnot compensating for data issues.