Download the dataset
Run the command:Unpack the dataset
Menu— Information about menus: the name of the restaurant, the date when menu was seen, etc.Dish— Information about dishes: the name of the dish along with some characteristic.MenuPage— Information about the pages in the menus, because every page belongs to some menu.MenuItem— An item of the menu. A dish along with its price on some menu page: links to dish and menu page.
Create the tables
We use Decimal data type to store prices.Import the data
Upload data into ClickHouse, run:format_csv_allow_single_quotes as only double quotes are used for data fields and single quotes can be inside the values and shouldn’t confuse the CSV parser.
We disable input_format_null_as_default as our data doesn’t have NULL. Otherwise ClickHouse will try to parse \N sequences and can be confused with \ in data.
The setting date_time_input_format best_effort allows to parse DateTime fields in wide variety of formats. For example, ISO-8601 without seconds like ‘2000-01-01 01:02’ will be recognized. Without this setting only fixed DateTime format is allowed.
Denormalize the data
Data is presented in multiple tables in normalized form. It means you have to perform JOIN if you want to query, e.g. dish names from menu items. For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doingJOIN every time. It is called “denormalized” data.
We will create a table menu_item_denorm where will contain all the data JOINed together:
Validate the data
Query
Response
Run some queries
Averaged historical prices of dishes
Query
Response
Burger prices
Query
Response
Vodka
Query
Response
ILIKE '%vodka%' and this definitely makes a statement.
Caviar
Let’s print caviar prices. Also let’s print a name of any dish with caviar.Query
Response