Search by job, company or skills

paynet (payments network malaysia)

Data Resiliency Engineer (Datalake)

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About PayNet: At PayNet, your work doesn't just move money; it moves a nation.

We make every payment count toward Malaysians shared prosperity by powering the platforms millions use every day, from DuitNow and FPX to MyDebit and JomPAY. Our systems keep Malaysia's digital economy running securely, seamlessly, and inclusively, whether you're tapping, transferring, paying bills, or expanding a business.

If you're excited about creating impact at a national scale and shaping how Malaysia pays, connects, and progresses, you'll fit right in.

About the Technology Division: Building the backbone of Malaysia's payment infrastructure. Led by Chief Technology Officer, Technology designs, evolves, and secures the always-on architecture that powers trusted, scalable digital payment services for the nation.

Summary of the role:

As a Data Resiliency Engineer, you will play a critical role in ensuring the stability, reliability, and continuous improvement of our data platform. You will be responsible for analyzing incidents, identifying root causes, and implementing effective solutions to deliver a seamless user experience. Leveraging your expertise in data ecosystem monitoring, incident management, and data quality optimization, you will help uphold high standards of operational excellence. You will utilize advanced monitoring and diagnostic tools (such as Datadog and other relevant platforms) to proactively detect, investigate, and resolve issues. In addition, you will serve as a subject matter expert for data-related inquiries and actively drive data quality initiatives across the organization. You will collaborate closely with development teams to ensure that payment reporting systems—both for external participants and internal stakeholders—are accurate, reliable, and aligned with evolving business requirements, supporting continuous improvement in data-driven operations. Beyond production support, you will also contribute to pre-production readiness processes and participate in major maintenance activities, including product deployments, onboarding preparations, and disaster recovery simulation exercises

Key Responsibilities:

  • Root Cause Analysis & Bug Fixing:
  • Analyze issues to uncover root causes and implement effective solutions, ensuring a smooth user experience.
  • Data Ecosystem Monitoring:
  • Oversee the data environment using advanced monitoring tools like Datadog, proactively identifying and addressing issues before they escalate.
  • Alert System Management:
  • Collaborate with the team managing pagination tool, ensuring timely responses to critical incidents and maintaining operational excellence.
  • Data Lake Support:
  • Serve as the primary contact for data-related inquiries, including ETL incident management, reporting challenges, and providing actionable insights.
  • Data Quality Monitoring:
  • Own and continuously enhance data quality monitoring tools by designing robust pipelines and frameworks that uphold the highest standards of data integrity. Proactively identify gaps in existing solutions and implement improvements to strengthen the effectiveness and reliability of data quality checks.
  • Critical Service Management and Support:
  • Lead initiatives to ensure critical services consistently meet defined SLAs. Act as the primary point of contact for incident reporting and event management, driving timely recovery and restoration of service operations.

What will make you successful

You will be successful by combining strong expertise in data platform operations, monitoring, and incident management with a disciplined approach to root cause analysis and continuous improvement. The ability to proactively detect, diagnose, and resolve data issues using advanced observability tools, while consistently upholding high standards of data quality and reliability, is critical. Success in this role also depends on close collaboration with development and business teams to ensure payment reporting systems are accurate, resilient, and aligned with evolving requirements. A mindset focused on operational excellence, production readiness, and resilience, supported by active participation in deployments, maintenance activities, and disaster recovery exercises, will enable you to deliver dependable, high‑quality data services at scale.

Must-have:

Data Operations & Production Application Maintenance Experience

  • The candidate should have hands-on experience in production data operations, including a strong understanding of best practices for data patching, data lifecycle management, and the key do's and don'ts when handling production data—particularly within financial domains.
  • The candidate should also possess experience in deploying and maintaining production applications, with a clear understanding of critical pre- and post-deployment activities. This includes the ability to respond effectively to production incidents, as well as communicate clearly and efficiently during service disruptions.

Familiarity with Modern Data Lake Data Pipeline Architecture

  • The candidate should have a solid understanding and practical experience with modern data pipeline architectures, recognizing that building and maintaining data pipelines differs significantly from traditional software or web application development.
  • Modern data pipelines are typically designed for scalability, observability, and interoperability, as opposed to tightly coupled systems that are difficult to scale and maintain. A strong grasp of these principles will enable the candidate to quickly contribute to and support production data operations.

Advantage to have:

Demonstrates strong knowledge in data engineering concepts and best practices.

  • Proficient in data engineering tools such as PySpark, Polars, and Dask.
  • Hands on experience on building end to end (from data ingestion up until data consumption/presentation layer) data pipeline

Experienced in operating and managing data tools on Kubernetes platforms, including Amazon EKS.

  • Will be working with tools built on Amazon EKS; therefore, familiarity with Kubernetes and EKS is highly advantageous and will enable the candidate to perform more effectively.

Hands-on experience in data pipeline operations, as well as software incident and problem management.

  • Beyond operational experience, exposure to multiple data pipeline or software application incidents is highly valued. This includes the ability to document issues thoroughly, communicate effectively with stakeholders, and apply key practices for managing post-production incidents.

Strong experience with AWS data services, including Amazon S3, AWS Glue, Amazon Athena, Amazon QuickSight, and AWS Lambda.

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 146121241

Similar Jobs