Code Security Python Package
Code Security for Python Package # audit uv add --dev pip-audit pip-audit # available versions pip index versions urllib3 # installed version uv pip freeze | grep urllib3 poetry show urllib3 # why package is installed uv tree --invert --package urllib3 poetry show --tree --why urllib3 # installs and adds to pyproject.toml uv add urllib3~=1.2.3 poetry add urllib3~=1.2.3 # to skip versions with known errors uv add "urllib3>=1.2.3,<1.3.0,!=1.2.4" poetry add "urllib3>=1.
2026-01-23    
Data Model SCD Type 2, Vault, Kimball, Inmon
SCD Type 2, 6NF/7NF, Data Vault SCD Type 2: One big table, all attributes together, duplicates on change. ProductID | Name | Category | Price | ValidFrom | ValidTo -----------+---------+-------------+-------+------------+---------- 456 | iPhone9 | Electronics | 1000 | 2020 | 2021 456 | iPhone9 | Electronics | 900 | 2021 | 2022 456 | iPhone9 | Computers | 900 | 2022 | NULL 6NF/7NF: Each table per attribute ProductID | Name | ValidFrom | ValidTo -----------+---------+------------+---------- 456 | iPhone9 | 2020 | NULL ProductID | Price | ValidFrom | ValidTo -----------+-------+------------+---------- 456 | 1000 | 2020 | 2021 456 | 900 | 2021 | NULL ProductID | Category | ValidFrom | ValidTo -----------+-------------+------------+---------- 456 | Electronics | 2020 | 2022 456 | Computers | 2022 | NULL Data Vault: Source-based grouping ProductID | BusinessKey -----------+------------- 456 | iPhone9 ProductID | Category | Price | Source | LoadDate -----------+-------------+-------+------------------+---------- 456 | Electronics | 1000 | Dell ERP | 2020 456 | Electronics | 900 | Distributor Feed | 2021 456 | Electronics | 900 | Dell ERP | 2021 456 | Computers | 900 | Dell ERP | 2022 Inmon vs Kimball Inmon builds Kimball-like views in Data Mart from core 3NF EDW
2026-01-22    
dbt Core, SQLMesh, Fivetran, Spark Declarative Pipelines
Data Wars The Phantom Menace: dbt Core Attack of the Clones: SQLMesh Revenge of the Sith: Fivetran A New Hope: Spark Declarative Pipelines
2026-01-17    
Data Engineering Hard Things
Data Engineering Hard Things There are only two hard things in Data Engineering: schema changes, source of truth, and off-by-one counts. Oh wait, that’s three things.
2026-01-10    
LIMIT Effectiveness
LIMIT Effectiveness PostgreSQL: row-store Snowflake: micro-partition pruning Iceberg: rich metadata parquet pruning Delta Lake: basic metadata parquet pruning BigQuery: columnar, MPP
2026-01-03    
Python Structural Pattern Match
Python Structural Pattern Match match in Python 3.10+ is much more powerful than traditional switch/case in other languages. For example, to merge overlapping intervals [[1,3],[2,6],[8,10],[10,15]] into [[1,6],[8,15]]. Without match With match intervals.sort() result = [] for interval in intervals: if not result: result.append(interval) # repeated, unclear index access elif result[-1][1] >= interval[0]: result[-1][1] = max( result[-1][1], interval[1] ) else: result.append(interval) intervals.sort() result = [] for interval in intervals: match (result, interval): case ([], _): result.
2026-01-02    
Python Functions with Multi-Valued Parameters
Python Functions with Multi-Valued Parameters str.strip() takes a string that is treated as a set of characters. >>> " Hello world! ".strip(" !") 'Hello world' str.startswith and str.endswith accept a tuple of values. >>> files = ["a/file1.md", "b/file2.py", "c/file3.txt"] >>> [f for f in files if f.startswith(("a/", "b/"))] ['a/file1.md', 'b/file2.py'] >>> [f for f in files if f.endswith(("md", "txt"))] ['a/file1.md', 'c/file3.txt'] re.split takes a string that is actually a regex pattern.
2025-12-31    
VSCode tips
Save history When AI agent gives up and rolls back all session changes before committing to Git, the savior is local saves in VSCode Timeline.
2025-11-28    
BigQuery Pipe syntax examples
BigQuery Pipe syntax examples with products (product_id, price) as ( select 'p1', 5 union all select 'p2', cast(NULL as integer) ) select * from products |> aggregate count(*) as count_all, count(price) as count_not_null |> extend if(count_all = count_not_null, 'PASS', 'FAIL') as test_result -- count_all count_not_null test_result -- 2 1 FAIL
2025-11-21    
A Halloween data horror story
A Halloween data horror story 🎃 We were migrating system U to system P. Q: How to calculate user onboard duration? A: We don’t, we copy the value from [deprecated system] field XYZ_OBD_WK Q: Can we ask the developer for that calculation? A: He/she has left 2 years ago. Q: Can we see the code? A: Here it is - a single 5k-line stored procedure 😱
2025-10-18