- minmax: Tracks the minimum and maximum value of an expression per block. Ideal for range queries on loosely sorted data.
- set(N): Tracks a set of values up to a specified size N for each block. Effective on columns with low cardinality per blocks.
- text: Builds an inverted index over tokenized string data, enabling efficient and deterministic full-text search. Recommended for natural language or large free-form text columns where precise token lookup and scalable multi-term search are required, instead of approximate Bloom filter–based approaches.
- bloom_filter: Probabilistically determines if a value exists in a block, allowing fast approximate filtering for set membership. Effective for optimizing queries looking for the “needle in a haystack”, where a positive match is needed.
- tokenbf_v1 / ngrambf_v1: (Deprecated) Specialized Bloom filter variants designed for searching tokens or character sequences in strings — particularly useful for log data or text search use cases. Deprecated in ClickHouse versions >= 26.2 in favor of text indexes.
- Columns with high overall cardinality but low cardinality within a block.
- Rare values that are critical for search (e.g. error codes, specific IDs).
- Cases where filtering occurs on non-primary key columns with localized distribution.
- Test skip indexes on real data with realistic queries. Try different index types and granularity values.
- Evaluate their impact using tools like send_logs_level=‘trace’ and
EXPLAIN indexes=1to view index effectiveness. - Always evaluate the size of an index and how it is impacted by granularity. Reducing granularity size often will improve performance to a point, resulting in more granules being filtered and needing to be scanned. However, as index size increases with lower granularity, performance can also degrade. Measure the performance and index size for various granularity data points. This is particularly pertinent on bloom filter indexes.
Example
Consider the following optimized table. This contains Stack Overflow data with a row per post.EXPLAIN indexes = 1:
ViewCount is correlated with the CreationDate (a primary key) as one might expect — the longer a post exists, the more time it has to be viewed.
ALTER TABLE commands — first adding it, then “materializing it”.
ViewCount values for each block of rows (granule) in the table:
Repeating our earlier query shows significant performance improvements. Notice the reduced number of rows scanned:
EXPLAIN indexes = 1 confirms use of the index.
ViewCount > 10,000,000 predicate in our example query: