APPLY modifier, making it incredibly useful for data analysis and transformation tasks.
We’re going to learn how to use this feature with help from the New York taxis dataset, which you can also find in the ClickHouse SQL playground.
Selecting columns that match a pattern
Let’s start with a common scenario: selecting only the columns that contain_amount from the NYC taxi dataset. Instead of manually typing each column name, we can use the COLUMNS expression with a regular expression:
Try this query in the SQL playgroundThis query returns the first 10 rows, but only for columns whose names match the pattern
.*_amount (any characters followed by “_amount”).
fee or tax.
We can update the regular expression to include those:
Try this query in the SQL playground
Selecting multiple patterns
We can combine multiple column patterns in a single query:Try this query in the SQL playground
Apply functions to all columns
We can also use theAPPLY modifier to apply functions across every column.
For example, if we wanted to find the maximum value of each of those columns, we could run the following query:
Try this query in the SQL playground
Try this query in the SQL playground
Try this query in the SQL playground
APPLY modifier accepts a lambda, which gives us the flexibility to have the round function round our average values to 2 decimal places:
Try this query in the SQL playground
Replacing columns
So far so good. But let’s say we want to adjust one of the values, while leaving the other ones as they’re. For example, maybe we want to double the total amount and divide the MTA tax by 1.1. We can do that by using theREPLACE modifier, which will replace a column while leaving the other ones as they’re.
Try this query in the SQL playground
Excluding columns
We can also choose to exclude a field by using theEXCEPT modifier. For example, to remove the tolls_amount column, we would write the following query:
Try this query in the SQL playground