Quick Start
Enabling Profiling
Profiler API
Getting the Profiler
report()
Display a performance report.| Parameter | Type | Default | Description |
|---|---|---|---|
min_duration_ms | float | 0.1 | Only show steps >= this duration |
- Duration in milliseconds for each step
- Percentage of parent/total time
- Hierarchical nesting of operations
- Metadata for each step (e.g.,
ops_count,ops)
step()
Manually time a code block.clear()
Clear all profiling data.summary()
Get a dictionary of step names to durations (ms).Understanding the Report
Step Names
| Step Name | Description |
|---|---|
Total Execution | Overall execution time |
Query Planning | Time spent planning the query |
SQL Segment N | Execution of SQL segment N |
SQL Execution | Actual SQL query execution |
Result to DataFrame | Converting results to pandas |
Cache Check | Checking query cache |
Cache Write | Writing results to cache |
Duration
- Planning steps (Query Planning): Usually fast
- Execution steps (SQL Execution): Where actual work happens
- Transfer steps (Result to DataFrame): Converting data to pandas
Identifying Bottlenecks
Profiling Patterns
Profile a Single Query
Profile Multiple Queries
Compare Approaches
Optimization Tips
- Check SQL Execution Time
If SQL execution is the bottleneck:
- Add more filters to reduce data
- Use Parquet instead of CSV
- Check for proper indexes (for database sources)
- Check I/O Time
If read_csv or read_parquet is the bottleneck:
- Use Parquet (columnar, compressed)
- Read only needed columns
- Filter at source if possible
- Check Data Transfer
If to_df is slow:
- Result set may be too large
- Add more filters or limit
- Use
head()for previewing