The following commands were executed on a Production instance of ClickHouse Cloud. You can easily run them on a local install as well.
- Let’s see what the data looks like:
- Now let’s view some of the rows:
url function easily reads data from a CSV file:
- We will create a table now that we know what the data looks like:
- The following command inserts the entire dataset into the
covid19table:
- It goes pretty quick - let’s see how many rows were inserted:
- Let’s see how many total cases of Covid-19 were recorded:
- You will notice the data has a lot of 0’s for dates - either weekends or days when numbers weren’t reported each day. We can use a window function to smooth out the daily averages of new cases:
- This query determines the latest values for each location. We can’t use
max(date)because not all countries reported every day, so we grab the last row usingROW_NUMBER:
- We can use
lagInFrameto determine theLAGof new cases each day. In this query we filter by theUS_DClocation:
- This query calculates the percentage of change in new cases each day, and includes a simple
increaseordecreasecolumn in the result set:
As mentioned in the GitHub repo, the dataset is no longer updated as of September 15, 2022.