Setup
Let’s first create a virtual environment:ipython to run the commands in the rest of the guide, which you can launch by running:
An intro to ClickPy
The remote ClickHouse server that we’re going to query is ClickPy. ClickPy keeps track of all the downloads of PyPI packages and lets you explore the stats of packages via a UI. The underlying database is available to query using theplay user.
You can learn more about ClickPy in its GitHub repository.
Querying the ClickPy ClickHouse service
Let’s import chDB:remoteSecure function.
This function takes in a host name, table name, and username at a minimum.
We can write the following query to return the number of downloads per day of the openai package as a Pandas DataFrame:
scikit-learn:
Merging Pandas DataFrames
We now have two DataFrames, which we can merge together based on date (which is thex column) like this:
scikit-learn downloads like this: