Overview

I use engineering as a catch-all for all forms of engineering - too many job titles. My and other engineers' responsibilities span the gamut:

Data Engineering -> Managing ETL pipelines
ML Engineering -> Monitoring ML model performance
Analytics Engineering -> Building DBT models
Software Engineering -> Creating front-end applications or APIs
DevOps Engineering -> Updating Jenkins CI/CD pipelines
Platform Engineering -> Managing cloud infrastructure
AI Engineering -> Developing with and monitoring LLM (multi-modal) applications.

In the end, Engineers solve problems given toolkits and constraints.

Apache Iceberg

BigQuery now supports Apache Iceberg tables but requires features in pre-release (namely, calling PySpark Stored Procedure). define Iceberg Table

Summary Apache Iceberg is a table format that separates the physical layer (e.g. parquet) from the logical layer. Apache Iceberg's rich metadata layer allows it to efficiently manage large data assets to provide transaction, schema evolution support, scalability, compression, and time travel. iceberg architecture

Tabular** is a commercial project from creators. commercial iceberg

DBT

Recently started learning and using DBT at work. There's plenty of great documentation. It's a clever framework for a really common use case for us, which is BigQuery to BigQuery transformations managed by Python and Jinja templating.

The three big concepts that I sell to the team to improve adoption.

Lineage <-- GCP/BigQuery offers a feature, but it's not turned on
Modularity <-- Complimentary to BigQuery artifacts, less heavy-handed
Declarative SQL <-- Especially for testing, specify column & test to run

dbt compile view sql code

The dbt Viewpoint is a great introduction to the project. I've started to adopt this mission statement to the projects that I build:

Here's the challenge
Here's how I think about the problem
Here's my vision

Overview ​

Apache Iceberg ​

DBT ​

Overview

Apache Iceberg

DBT