Part 2/2: Data pipelines, Documentation and Lineage with SQL & dbt

SQL is an integral part of data analysis - it is comparatively easy to learn and can be easily run on a database. Accordingly, it is very popular with many data analysts. In reality, you often find the pattern: Python glue code in notebooks that executes SQL statements.

Oct 24
Alfândega Porto Congress Centre
2 hours
14:00 - 16:00 UTC
Matthias Niehoff
-

dbt (data build tool) is a command line tool for building SQL data pipelines in a structured way. It also enables the validation of data. And the result is not only tables in a database, but also documentation and dependency graphs. This not only helps with the preparation of data. The subsequent regular analyses and evaluations can also be conveniently automated. Including traceability of which analyses use which data. And if the basic data contains errors, the analyses based on it are not even updated.

With dbt, even a developer can have fun with SQL & data transformations!

We will cover:
- sources & models
- documentation & lineage
- testing your data & enforce contracts
- extending dbt with plugins
- how to: ci/cd, deployment

This workshop is a kickstarter for you to get going with your structured data pipelines to power analytics.

Matthias Niehoff

Head of Data @ codecentric

Matthias Niehoff works as Head of Data and Data Architect for codecentric AG and supports customers in the design and implementation of data architectures. His focus is not so much on the ML model, but rather on the necessary infrastructure and organization to help data science projects succeed.