Role Overview:
We are seeking a highly skilled Cloud Data Engineer to design, build, and optimize data pipelines, AI-driven solutions, and cloud-based architectures. This role is ideal for individuals with strong logical and analytical thinking skills and hands-on experience with ETL processes, dashboards, and data engineering using PySpark. This is an exciting opportunity to work on Generative AI innovations and AWS native technologies in a hybrid work environment.
Key Responsibilities:
1. Data Pipeline and ETL Development:
- Build, optimize, and manage ETL pipelines using PySpark and AWS Glue for large-scale data processing.
- Design robust data workflows for processing structured and unstructured data.
- Ensure data integrity and security in all stages of processing.
2.Dashboard and Data Visualization for Data engineer:
- Design and develop dashboards using tools like AWS QuickSight, Tableau, or Power BI.
- Collaborate with stakeholders to create insightful visualizations for data-driven decision-making.
3. AI/ML Model Development and Deployment:
- Develop, deploy, and maintain AI/ML models using frameworks such as TensorFlow, PyTorch, or Scikit-learn.
- Implement models on cloud platforms using AWS SageMaker and automate model training and deployment pipelines.
4. Cloud Infrastructure and Data Management:
- Architect and deploy scalable data solutions using AWS services like Redshift, EMR, and S3.
- Use Infrastructure as Code tools (e.g., Terraform, AWS CDK, or CloudFormation) to automate deployments.
5. Performance Optimization:
- Optimize ETL pipelines, AI models, and data queries for performance, cost-efficiency, and scalability.
- Monitor data workflows and resolve bottlenecks proactively.
6. Explore AWS and Generative AI Innovations:
- Gain hands-on experience with Generative AI tools and frameworks to create innovative data and AI solutions.
- Experiment with the latest AWS native technologies to enhance data pipelines and AI projects.
Requirements:
- Education: Bachelor's degree in Computer Science, Data Science, Engineering, or a related field. Equivalent practical experience will also be considered.
- Hands-on experience with ETL pipelines using PySpark and data transformation tools like AWS Glue.
- Proficiency in building interactive dashboards with tools like Tableau, AWS QuickSight, or Power BI.
- Strong programming skills in Python (preferred) or other languages for data processing and AI/ML development.
- Familiarity with cloud platforms (AWS preferred) and services like S3, Redshift, and SageMaker.
- Strong logical and analytical thinking skills for solving complex data problems.
- Knowledge of SQL and database management systems.
Preferred Skills:
- Relevant AWS Certifications (e.g., AWS Certified Data Analytics, AWS Certified Machine Learning) are a strong plus.
- Senior candidates (4+ years) should demonstrate expertise in PySpark, dashboard development, large-scale data processing, and AI/ML model deployment.
- Familiarity with monitoring tools for data pipelines and AI workflows.
- Strong communication skills for collaborating across teams and presenting data insights.