Role Summary
We are hiring a Data Engineer to build and optimize scalable data pipelines and Lakehouse solutions using AWS & Databricks for a leading organization in the banking/financial services domain.
This is an excellent opportunity to work on enterprise-grade data platforms with strong requirements in security, governance, and performance.
Key Responsibilities
- Design, develop, and maintain ETL/ELT pipelines on Databricks using PySpark, Spark SQL, and Delta Lake
- Build reliable ingestion frameworks using AWS services:
- S3, Glue, Lambda, Step Functions
- Kafka/MSK or Kinesis (streaming ingestion)
- Integration with on-prem databases / RDS / Redshift
- Automate workflows using Databricks Workflows, Airflow, or similar tools
- Optimize Lakehouse performance (partitioning, Delta optimization, cost/compute tuning)
- Implement data quality checks, monitoring, and incident troubleshooting for production pipelines
- Apply governance and security controls (PII protection, access control, audit readiness)
- Collaborate with data analysts/scientists and business stakeholders to deliver trusted datasets
Requirements
Must-have
- 2-5+ years of Data Engineering experience
- Strong hands-on experience with:
- AWS (S3, Glue, Lambda, IAM, Step Functions)
- Databricks (PySpark, Delta Lake, Workflows)
- Python & SQL
- Good understanding of data modeling (relational/dimensional)
- Experience working in large-scale distributed data environments
- Good English Communication
Nice-to-have
- Banking/financial domain experience (core banking, payments, lending, reporting)
- Streaming experience (Kafka/MSK, Kinesis, Spark Structured Streaming)
- Governance/catalog tools (Unity Catalog preferred)
- IaC experience (Terraform / AWS CDK)
- Familiar with compliance/security in regulated environments