Role Overview
We are seeking a Data Bricks Data Architect to support the design, implementation, and optimization of cloud-native data platforms built on the Data bricks Lakehouse Architecture. This is a hands-on, engineering-driven role requiring deep experience with Apache Spark, Delta Lake, and scalable data pipeline development, combined with early-stage architectural responsibilities.
The role involves close onsite collaboration with client stakeholders, translating analytical and operational requirements into robust, high-performance data architectures, while adhering to best practices for data modeling, governance, reliability, and cost efficiency.
Key Responsibilities
· Design, develop, and maintain batch and near-real-time data pipelines using Databricks, PySpark, and Spark SQL
· Implement Medallion (Bronze/Silver/Gold) Lakehouse architectures, ensuring proper data quality, lineage, and transformation logic across layers
· Build and manage Delta Lake tables, including schema evolution, ACID transactions, time travel, and optimized data layouts
· Apply performance optimization techniques such as partitioning strategies, Z-Ordering, caching, broadcast joins, and Spark execution tuning
· Support dimensional and analytical data modeling for downstream consumption by BI tools and analytics applications
· Assist in defining data ingestion patterns (batch, incremental loads, CDC, and streaming where applicable)
· Troubleshoot and resolve pipeline failures, data quality issues, and Spark job performance bottlenecks.
Nice-to-Have Skills
· Exposure to Data bricks Unity Catalog, data governance, and access control models
· Experience with Data bricks Workflows, Apache Airflow, or Azure Data Factory for orchestration
· Familiarity with streaming frameworks (Spark Structured Streaming, Kafka) and/or CDC patterns
· Understanding of data quality frameworks, validation checks, and observability concepts
· Experience integrating Data bricks with BI tools such as Power BI, Tableau, or Looker
· Awareness of cost optimization strategies in cloud-based data platforms
· Prior Lifesciences Domain Experience