🔹 User Story 2 – Analyzing 3 Years of Clickstream Logs
📌 Problem:
You want to run a query like:
“Show me how many users searched for laptops in the last 3 years, grouped by month.”
This data is huge (terabytes/petabytes) and partitioned by date, device, region.
✅ Solution with Apache Iceberg
Iceberg allows partition evolution → you can change partitioning over time without rewriting old data.
Example:
Year 1: Partition by date
.
Year 2: Partition by date + region
.
Iceberg can still query across both schemes seamlessly.
Iceberg snapshots allow you to query exactly the dataset at a certain point in time (great for reproducible analytics).
👉 This ensures fast queries on massive historical logs without reprocessing old data.