Data Engineer (Python)
Experience: 5–6 years | Location: Dhahran/Khobar, KSA — onsite only | Duration: 3 months (extension possible) | Availability: Immediate
Role Overview
Build the data backbone of the MRO Inventory Optimization solution — ingestion, cleansing, transformation, and the optimization logic that turns raw SAP material master and inventory data into actionable outputs. You'll own pipelines from source through to the analytics and application layers.
Must-Have — technical depth expected
- Python: Production-grade code, modular design, packaging, logging, config management, unit testing (pytest); strong grasp of data structures and performance.
- Pandas / NumPy: Vectorized transformations, joins/merges, groupby/aggregation, handling large datasets, deduplication, type coercion, working with messy real-world MRO/master data.
- Airflow: Authoring DAGs, operators/sensors, scheduling and backfills, task dependencies, retries/SLAs, idempotent pipeline design, parameterization.
- BigQuery: Writing performant SQL, partitioning/clustering, cost-aware querying, loading/exporting data, working with nested/repeated fields.
- SQL: Advanced joins, window functions, CTEs, aggregation, query optimization across relational and warehouse engines.
- API development: Building and consuming REST APIs (FastAPI/Flask), request validation, pagination, integration with upstream systems (e.g., SAP-sourced data via CPI/OData).
Good-to-Have
PySpark (distributed transforms), ML basics (forecasting/classification relevant to inventory optimization — EOQ, demand forecasting, slow-moving/obsolete stock detection), data quality frameworks (Great Expectations or similar), Docker, CI/CD.
Scope of Work
- Data ingestion from SAP material master and inventory feeds (via API/OData) and other sources into the warehouse.
- Data cleansing and master data processing — standardizing material descriptions, deduplication, classification, handling incomplete records.
- Build and orchestrate ETL pipelines (Airflow → BigQuery), ensuring reliability, idempotency, and data lineage.
- Implement inventory optimization logic (reorder points, safety stock, EOQ, criticality/ABC analysis, obsolescence flags).
- Develop backend services / APIs exposing processed data to the UI and BI layers.