Job description:
Job Summary
We are seeking a Junior Data Engineer with strong interest in cloud data platforms and backend/data integration engineering. You will build and maintain data pipelines on GCP, integrate data from APIs & web sources, and support high-quality datasets used across the business.
This role is ideal for someone with solid Python + SQL fundamentals who wants to expand into workflow orchestration, large-scale warehousing, and API-driven data ingestion.
Key Responsibilities
Data Pipelines & Integration
- Build and maintain ETL/ELT pipelines to ingest data into BigQuery.
- Integrate external & internal data sources via:
- REST / GraphQL APIs
- Batch file ingestion (CSV, JSON, Parquet)
- Web scraping (Python requests, BeautifulSoup, Selenium, etc.)
- Develop orchestration workflows using Airflow / Cloud Composer.
- Work with Pub/Sub, Cloud Storage, and Cloud Run for data ingestion.
Data Warehousing & Modeling
- Support data transformations and modeling in BigQuery.
- Implement basic data warehouse principles (star schema, SCD concepts).
- Create curated datasets to support analytics and reporting
Software Engineering
- Write clean, maintainable Python code using OOP and modular design.
- Build reusable connectors, scraping scripts, and data ingestion frameworks.
- Participate in code reviews and follow engineering best practices.
- Use CI/CD pipelines (Cloud Build, GitHub Actions, GitLab CI, etc.) for deployments.
Data Quality & Operations
- Assist with pipeline monitoring, alerting, and troubleshooting.
- Validate data accuracy and completeness.
- Maintain documentation of workflows, data models, and sources.
Collaboration
- Work closely with senior engineers, analysts, and business teams.
- Translate requirements into technical tasks & solutions.
- Communicate status, blockers, and risks effectively.
Required Skills & Qualifications
- 02 years of experience in Data Engineering, Software Engineering, or related fields.
- Strong programming experience in Python (requests, pandas, etc.).
- Comfortable working with SQL for data transformation and analysis.
- Experience or strong interest in:
- BigQuery
- Cloud Composer / Airflow
- GCP components: GCS, Pub/Sub
- Basic knowledge of:
- REST APIs & authentication
- JSON / CSV formats
- Hands-on Git version control.
Tech Stack Exposure
- Languages: Python, SQL
- Orchestration: Airflow / Cloud Composer
- Data Warehouse: BigQuery
- Storage: GCS
- Messaging: Pub/Sub
- Integration: REST APIs, web scraping
- Tools: Git, Docker
- Optional: Terraform, Cloud Run
Prefered (Nice to have)
- Experience with API integration libraries (requests, aiohttp).
- Web scraping experience (BeautifulSoup, Scrapy, Selenium)
- Familiarity with Docker.
- Experience with Terraform / IaC.
- Knowledge of Dataflow/Beam or Spark.
- Familiarity with event-driven design.