Job Posting: AWS Data Pipeline Engineer (ETL - Multi-Source to PostgreSQL)
Project Overview
Build a production-ready ETL pipeline that extracts data from 3 source systems (Oracle/SAP, Microsoft Dataverse CRM, MySQL), transforms it with complex business logic, and loads into AWS RDS PostgreSQL as 3 master tables. Daily automated execution serving an app on AWS.
Technical Stack (Required)
AWS Services: Glue (PySpark), Step Functions, RDS PostgreSQL, S3, Secrets Manager, CloudWatch
Languages: Python 3.9+, PySpark 3.x, SQL
IaC: Terraform or CloudFormation
Sources: Oracle (JDBC), MySQL (JDBC), Dataverse API (OAuth 2.0)
Scope
Extract: 10 tables from 3 DBs (~500MB total)
Transform: Complex joins, aggregations, calculated fields to 3 master tables
Load: PostgreSQL with Row-Level Security policies
Orchestrate: Step Functions with error handling, monitoring, alerting
Schedule: Daily execution, less than 2 hour completion time
Key Challenges
Multi-source integration (JDBC + REST API)
Complex transformations (multi-table joins, aggregations, JSONB structures)
PostgreSQL RLS implementation (role-based data access)
Data quality validation and reconciliation
GDPR compliance (EU region, encryption)
Deliverables
Code: Glue extraction/transformation/load jobs, Step Functions workflow, tests (greater than 80% coverage)
IaC: Terraform/CloudFormation for all AWS resources
Database: DDL scripts with RLS policies
Documentation: Architecture diagram, deployment guide, runbook
Testing: Integration test suite, performance test results
Required Skills
✅ 5+ years AWS (Glue, Step Functions, RDS)
✅ Expert PySpark for complex ETL
✅ PostgreSQL (including RLS)
✅ JDBC connections (Oracle, MySQL)
✅ REST API integration (OAuth, pagination)
✅ Infrastructure as Code
✅ Data quality frameworks
Highly Desirable
Microsoft Dataverse/Dynamics 365 API experience
AWS Glue Data Catalog
CI/CD pipelines
GDPR compliance experience
Timeline & Budget
Duration: TBD
Availability: TBD
Payment: Milestone-based
Budget: negotiable based on experience
To Apply Provide:
Portfolio: Links to similar AWS ETL projects (GitHub preferred)
Brief approach: How would you architect this pipeline? (200 words)
Dataverse experience: Have you worked with Dataverse API? Describe briefly.
Availability: Start date and weekly hours
Rate: fixed-price proposal
Questions
Largest data volume processed with AWS Glue? Optimization techniques used?
Experience with PostgreSQL Row-Level Security?
Terraform or CloudFormation preference and why?
Ideal Candidate
Built production ETL pipelines on AWS for enterprise clients
Comfortable with complex transformations and business logic
Writes clean, testable, maintainable code
Works independently, communicates proactively
Can deliver production-quality work with minimal supervision
Location: Remote (EU timezone preferred)
Client: AWS Advanced Partner, manufacturing client in Greece
Region: EU (Frankfurt) for GDPR compliance
Tags: AWS Glue, PySpark, ETL, PostgreSQL, Data Pipeline, AWS Step Functions, Python, Oracle, MySQL, Dataverse, Terraform, Data Engineering
Apply Now
Apply Now