- Assist in maintaining and further stabilizing an existing data platform being hosted on Azure.
- The pipeline consists of various core components that are measured using KPAs.
- Components range from custom airflow operators, python microservices to spark jobs written in pyspark.
- Data is being ingested from various 3rd party marketing and analytics platforms.
- Assist in implementing solutions and fixes to components that lead to KPA improvements.
- Help debug data anomalies and track down bugs in existing components.
- Assist in mitigating the impact of 3rd party API changes to the system.
- Work with the Features team to ensure that system changes do not negatively impact KPAS.
- Assist in defining new KPAS that improve visibility of existing components when required.
- Python 4/5
- Django 3/5
- Kubernetes (Azure aks) 3/5
- Apache Airflow 1 4/5
Spark (pyspark, executing on k8s) 4/5
- Hive 3/5
- HDFS 3/5
- Postgres 3/5
- SQL 3/5
- Docker 3/5
- Understanding both orc and parquet formats 3/5
- Azure blob store 2/5
- Microservice Architecture 3/5
- ETL pipeline design/architecture/implementation concepts 4/5
- Schema management/Schema evolution 3/5
- Data warehouseing 3/5
- Git 3/5
(The numbers next to the skills are the minimum level we expect out of 5)
- Apache Airflow 2
- Azure datalake storage gen 2
- Synapse dedicated and serverless sql (or azure sql datawarehouse experience)
- Graphana, Loki, Promethus (understanding monitoring tools and being able to source error logs etc)