Ingesting Parquet files from an S3 bucket
Below are some basics of using the S3 table engine to read parquet files.- create access and secret keys for an IAM service user. normal login users usually don’t work since they may have been configured with an MFA policy.
- set the permissions on the policy to allow the service user to access the bucket and folders.
** which specifies all subdirectories recursively.
https://clickhouse.com/docs/sql-reference/table-functions/s3/
For example, assuming the paths and bucket structure is something like this:
https://your_s3_bucket.s3.amazonaws.com/<your_folder>/<year>/<month>/<day>/<filename>.parquet
https://mars-doc-test.s3.amazonaws.com/system_logs/2022/11/01/my-app-logs-0001.parquet
This would get all files for 1st day of every month in 2021-2022
https://mars-doc-test.s3.amazonaws.com/system_logs/{2021-2022}/**/01/*.parquet