VSCode tips
Save history When AI agent gives up and rolls back all session changes before committing to Git, the savior is local saves in VSCode Timeline.
BigQuery Pipe syntax examples
BigQuery Pipe syntax examples with products (product_id, price) as ( select 'p1', 5 union all select 'p2', cast(NULL as integer) ) select * from products |> aggregate count(*) as count_all, count(price) as count_not_null |> extend if(count_all = count_not_null, 'PASS', 'FAIL') as test_result -- count_all count_not_null test_result -- 2 1 FAIL
A Halloween data horror story
A Halloween data horror story 🎃 We were migrating system U to system P.
Q: How to calculate user onboard duration?
A: We don’t, we copy the value from [deprecated system] field XYZ_OBD_WK
Q: Can we ask the developer for that calculation?
A: He/she has left 2 years ago.
Q: Can we see the code?
A: Here it is - a single 5k-line stored procedure 😱
Bash command status check
Bash command status check Do not check for command output - it might fail to detect failure Check for command return value instead run_ls() { ls "$@" || { echo 'RUN_FAILED'; } } # missing \ at the end of line below CMD_OUTPUT=$(run_ls -a ) if [[ "$CMD_OUTPUT" == *"RUN_FAILED"* ]]; then echo 'Fail' else echo 'Pass' fi ~ ❯ ./test.sh ./test.sh: line 8: -a: command not found Pass run_ls() { ls "$@" || { return 1; } } # missing \ at the end of line below CMD_OUTPUT=$(run_ls -a ) CMD_STATUS=$?
dbt Model Versioning
dbt model versioning to avoid breaking downstream consumers Models A -> B -> Consumers.
There is a major change in B to be observed on production for some time before publishing to consumers.
Versioning model B is the perfect approach for this use case to test-drive new logic while limiting code changes to a single model.
Existing consumers:
SELECT * FROM b;
New logic data:
SELECT * FROM b_v2;
models: - name: b latest_version: 1 versions: - v: 1 defined_in: b config: alias: b - v: 2 defined_in: b_v2
Auto Daily LeetCode Notification
Automating Daily LeetCode Notifications My simple solution for daily LeetCode notifications:
💡 Tech Stack:
GitHub Actions: handles runtime and scheduling. curl: fetches data from GraphQL. jq: extracts problem details. grep: derives new or solved status. JS: creates GitHub issues to trigger email notifications. flat file: extracts and stores solved problems. ❌ Avoided:
Airflow, Python, Kafka, Redis, and S3
See it in action
Auto-append solved problems echo "message=$(git log -1 --pretty=%B)" >> $GITHUB_ENV if [[ $message =~ ^[0-9]+$ ]]; then echo "is_integer=true" >> $GITHUB_ENV fi
Python-dotenv not updated vars
Python-dotenv not updated vars python-dotenv by default does not override existing environment variables.
Always use load_dotenv(..., override=True) to make sure our configs are updated.
🔴 🔴 export FOO=gotcha echo 'FOO=BAR' > demo.env cat << EOF > demo.py from dotenv import load_dotenv import os load_dotenv('demo.env') print(os.environ['FOO']) EOF python demo.py gotcha echo $FOO gotcha 🟢 🟢 export FOO=gotcha echo 'FOO=BAR' > demo.env cat << EOF > demo2.py from dotenv import load_dotenv import os load_dotenv('demo.
BigQuery UNION syntax benefits
UNION with Fewer Columns LEFT UNION ALL BY NAME
with products_1 as ( -- primary source select 'p1' as product_id, 5 as price union all select 'p2', 10 ), products_2 as ( -- secondary source select 'p1' as product_id union all select 'p3' ), products_new as ( select p2.* from products_2 p2 left outer join products_1 p1 on p1.product_id = p2.product_id where p1.product_id is null ) select product_id, price from products_1 union all select product_id, null from products_new with products_1 as ( -- primary source select 'p1' as product_id, 5 as price union all select 'p2', 10 ), products_2 as ( -- secondary source select 'p1' as product_id union all select 'p3' ) select * from products_1 |> left union all by name ( -- products_new select p2.
BigQuery Pipe syntax benefits
End of Subquery Madness “pipe operator can see every alias that exists”
with products as ( select 'p1' as product_id, 5 as price union all select 'p2', 10 ), product_classification as ( select *, case when price > 5 then 'Expensive' else 'Cheap' end as classification from products ) select product_id, classification from product_classification where classification = 'Cheap' -- error without subquery with products as ( select 'p1' as product_id, 5 as price union all select 'p2', 10 ) select product_id, case when price > 5 then 'Expensive' else 'Cheap' end as classification from products where classification = 'Cheap' -- ERROR Unrecognized name: classification with products as ( select 'p1' as product_id, 5 as price union all select 'p2', 10 ) select * from products |> extend case when price > 5 then 'Expensive' else 'Cheap' end as classification |> where classification = 'Cheap' |> select product_id, classification
Column-level Lineage over Documentation
Where is source attribute A in the dataset? This question from Data Analysts always required constantly updated documentation or dedicated technical support. Column-level Lineage can answer this basic question if made available to Data Analysts.
It will improve collaboration between Data Engineers and Data Analysts, free Data Engineers from those basic questions and let them focus on actual building tasks.
Illustration taken from SQLMesh.
See it for yourself dbt Cloud Enterprise https://docs.