Type optimization
The general approach to optimizing storage efficiency is using optimal data types. Let’s take theproject and subproject columns. These columns are of type String, but have a relatively small amount of unique values:
hits column, which takes 8 bytes, but has a relatively small max value:
Specialized codecs
When we deal with sequential data, like time-series, we can further improve storage efficiency by using special codecs. The general idea is to store changes between values instead of absolute values themselves, which results in much less space needed when dealing with slowly changing data:time column, which is a good fit for time-series data.
The right ordering key can also save disk space.
Since we usually want to filter by a path, we will add path to the sorting key.
This requires recreation of the table.
Below we can see the CREATE command for our initial table and the optimized table: