Why AI is Bad at SQL but Good at DataFrame
Why AI is Bad at SQL but Good at DataFrame
| Logic | DataFrame | SQL |
|---|---|---|
| Filter before Agg | .filter().groupBy().agg() | (WHERE) AS cte, ... GROUP BY |
| Filter after Agg | .groupBy().agg().filter() | GROUP BY ... HAVING |
| Filter in Window | .withColumn().filter() | OVER() ... QUALIFY |
| Cross Join | .crossJoin(df2) | CROSS JOIN / , |
| Exists / SemiJoin | .join(df2, "id", "left_semi") | WHERE EXISTS (SELECT 1 FROM) |
| Not Exists / AntiJoin | .join(df2, "id", "left_anti") | WHERE NOT EXISTS / EXCEPT |
- For DataFrame, code is a stream of tokens in the exact same order of the logic, with the same consistent keywords.
- For SQL, both humans and AI must constantly look ahead or look back to place the right keyword in the right clause.
- For me, why must I keep learning new SQL syntax while I have been writing SQL since 2004!
Last modified on 2026-05-15