Session Outline

A technical deep dive into how my team builds, tests, quality controls, and operates thousands of data pipelines. We leverage open source table formats and processing frameworks to deliver operational data nearline as well as our ML feature store.

Key Takeaways

  • table streaming for cheap nearline (minutes) data processing
  • leveraging ML to find anomalies on streaming velocity across our datasets 
  • query any dataset/stream from via a fully automated metastore


Speaker Bio

Micha Kunze – Lead Data Engineer | Maersk

Micha is a passionate engineer and scientist, obsessed with automating himself out of his job. He is laser focused on delivering value with data and machine learning. Developer productivity is a central concern for him, and he is constantly learning and improving his ways of working with data. Today he is the Lead Data Engineer of the Forecasting team at Maersk, responsible for delivering data for operations of transportation of goods world wide as well as feeding ML models to optimize the delivery of all of these goods. Micha holds a degree in physics and a PhD in biophysics, has more than a decade of experience in academia using high performance computing prior to jumping into the data engineering arena. He understands the potential value of data and extracting that value is his driver.

October 25 @ 16:00
16:00 — 16:30 (30′)


Micha Kunze – Lead Data Engineer | Maersk