Hive-style partition done wrong
Wrong Hive-style partition
More than once I have seen this nicely-done partition structure, only to be ruined by having different file types in the same place.
s3://bucket/
year=2023/
year=2024/
..
month=07/
month=08/
raw.csv
processed.csv
Spark/Glue or Presto/Trino/Athena would have worked directly on top of this structure, if there is a single file type either raw or processed.
The other file types are to be put in separate buckets with similar structures.
Right Hive-style partition
s3://raw-bucket/
year=2024/
month=08/
raw.csv
s3://processed-bucket/
year=2024/
month=08/
processed.csv
Last modified on 2024-08-27