Code Security Python Package
Code Security for Python Package # audit uv add --dev pip-audit pip-audit # available versions pip index versions urllib3 # installed version uv pip freeze | grep urllib3 poetry show urllib3 # why package is installed uv tree --invert --package urllib3 poetry show --tree --why urllib3 # installs and adds to pyproject.toml uv add urllib3~=1.2.3 poetry add urllib3~=1.2.3 # to skip versions with known errors uv add "urllib3>=1.2.3,<1.3.0,!=1.2.4" poetry add "urllib3>=1.
Data Model SCD Type 2, Vault, Kimball, Inmon
SCD Type 2, 6NF/7NF, Data Vault SCD Type 2: One big table, all attributes together, duplicates on change. ProductID | Name | Category | Price | ValidFrom | ValidTo -----------+---------+-------------+-------+------------+---------- 456 | iPhone9 | Electronics | 1000 | 2020 | 2021 456 | iPhone9 | Electronics | 900 | 2021 | 2022 456 | iPhone9 | Computers | 900 | 2022 | NULL 6NF/7NF: Each table per attribute ProductID | Name | ValidFrom | ValidTo -----------+---------+------------+---------- 456 | iPhone9 | 2020 | NULL ProductID | Price | ValidFrom | ValidTo -----------+-------+------------+---------- 456 | 1000 | 2020 | 2021 456 | 900 | 2021 | NULL ProductID | Category | ValidFrom | ValidTo -----------+-------------+------------+---------- 456 | Electronics | 2020 | 2022 456 | Computers | 2022 | NULL Data Vault: Source-based grouping ProductID | BusinessKey -----------+------------- 456 | iPhone9 ProductID | Category | Price | Source | LoadDate -----------+-------------+-------+------------------+---------- 456 | Electronics | 1000 | Dell ERP | 2020 456 | Electronics | 900 | Distributor Feed | 2021 456 | Electronics | 900 | Dell ERP | 2021 456 | Computers | 900 | Dell ERP | 2022 Inmon vs Kimball Inmon builds Kimball-like views in Data Mart from core 3NF EDW
dbt Core, SQLMesh, Fivetran, Spark Declarative Pipelines
Data Wars The Phantom Menace: dbt Core Attack of the Clones: SQLMesh Revenge of the Sith: Fivetran A New Hope: Spark Declarative Pipelines
Data Engineering Hard Things
Data Engineering Hard Things There are only two hard things in Data Engineering: schema changes, source of truth, and off-by-one counts.
Oh wait, that’s three things.
LIMIT Effectiveness
LIMIT Effectiveness PostgreSQL: row-store Snowflake: micro-partition pruning Iceberg: rich metadata parquet pruning Delta Lake: basic metadata parquet pruning BigQuery: columnar, MPP
Python Structural Pattern Match
Python Structural Pattern Match match in Python 3.10+ is much more powerful than traditional switch/case in other languages.
For example, to merge overlapping intervals [[1,3],[2,6],[8,10],[10,15]] into [[1,6],[8,15]].
Without match With match intervals.sort() result = [] for interval in intervals: if not result: result.append(interval) # repeated, unclear index access elif result[-1][1] >= interval[0]: result[-1][1] = max( result[-1][1], interval[1] ) else: result.append(interval) intervals.sort() result = [] for interval in intervals: match (result, interval): case ([], _): result.
Python Functions with Multi-Valued Parameters
Python Functions with Multi-Valued Parameters str.strip() takes a string that is treated as a set of characters.
>>> " Hello world! ".strip(" !") 'Hello world' str.startswith and str.endswith accept a tuple of values.
>>> files = ["a/file1.md", "b/file2.py", "c/file3.txt"] >>> [f for f in files if f.startswith(("a/", "b/"))] ['a/file1.md', 'b/file2.py'] >>> [f for f in files if f.endswith(("md", "txt"))] ['a/file1.md', 'c/file3.txt'] re.split takes a string that is actually a regex pattern.
VSCode tips
Save history When AI agent gives up and rolls back all session changes before committing to Git, the savior is local saves in VSCode Timeline.
BigQuery Pipe syntax examples
BigQuery Pipe syntax examples with products (product_id, price) as ( select 'p1', 5 union all select 'p2', cast(NULL as integer) ) select * from products |> aggregate count(*) as count_all, count(price) as count_not_null |> extend if(count_all = count_not_null, 'PASS', 'FAIL') as test_result -- count_all count_not_null test_result -- 2 1 FAIL
A Halloween data horror story
A Halloween data horror story 🎃 We were migrating system U to system P.
Q: How to calculate user onboard duration?
A: We don’t, we copy the value from [deprecated system] field XYZ_OBD_WK
Q: Can we ask the developer for that calculation?
A: He/she has left 2 years ago.
Q: Can we see the code?
A: Here it is - a single 5k-line stored procedure 😱