Hadoop Data Engineer responsible for designing, developing, and maintaining large-scale data processing systems within a distributed Hadoop ecosystem. The role focuses on enabling data-driven decision-making across banking operations, risk management, compliance, and customer analytics.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using Hadoop ecosystem tools (HDFS, Hive, Spark, Sqoop, Kafka).
- Build and optimize ETL/ELT processes to support data ingestion from multiple banking systems.
- Develop and manage big data solutions for structured and unstructured data.
- Collaborate with data analysts, data scientists, and business stakeholders to deliver data solutions.
- Ensure data quality, integrity, and governance aligned with banking and regulatory standards.
- Perform performance tuning and optimization of Hadoop/Spark jobs.
- Implement data security controls to comply with financial regulations (e.g., PCI, SOX).
- Support real-time and batch data processing frameworks.
- Troubleshoot production issues and provide continuous support for data platforms.
- Work with cloud platforms (e.g., AWS, Azure) for modern data solutions.
Required Skills & Qualifications
Technical Skills
- Strong experience with:
- Hadoop ecosystem (HDFS, MapReduce, Hive, HBase)
- Apache Spark (Scala/Python)
- SQL & NoSQL databases
- ETL tools (Informatica, Talend, or similar)
- Kafka or other streaming tools
- Proficiency in programming:
- Python / Java / Scala
- Experience with:
- Data warehousing concepts
- Workflow orchestration tools (Airflow, Oozie)
- Unix/Linux environments
- Knowledge of cloud data platforms (AWS EMR, Azure Data Lake) is a plus
Domain Knowledge
- Understanding of banking and financial services data
- Exposure to risk, compliance, or fraud analytics is preferred
Soft Skills
- Strong problem-solving and analytical abilities
- Excellent communication and collaboration skills
- Ability to work in Agile/Scrum environments
Education & Experience
- Bachelor’s or Master’s degree in:
- Computer Science, Information Technology, or related field
- Typically 5–10 years of experience in data engineering or big data development