The One-Line Migration
The simplest migration is changing your import:Step-by-Step Migration
Install chDB
Change the Import
Handle Any Differences
A few operations behave differently. See Key Differences below.What Works Unchanged
Data Loading
Filtering
Selection
GroupBy and Aggregation
Sorting
String Operations
DateTime Operations
I/O Operations
Key Differences
- Lazy Evaluation
DataStore operations are lazy - they don’t execute until results are needed.
pandas:
- Return Types
| Operation | pandas Returns | DataStore Returns |
|---|---|---|
df['col'] | Series | ColumnExpr (lazy) |
df[['a', 'b']] | DataFrame | DataStore (lazy) |
df[condition] | DataFrame | DataStore (lazy) |
df.groupby('x') | GroupBy | LazyGroupBy |
- No inplace Parameter
DataStore doesn’t support inplace=True. Always use the return value:
pandas:
- Comparing DataStores
pandas doesn’t recognize DataStore objects, so use to_pandas() for comparison:
- Row Order
DataStore may not preserve row order for file sources (like SQL databases). Use explicit sorting:
Migration Patterns
Pattern 1: Read-Analyze-Write
Pattern 2: DataFrame with pandas Operations
If you need pandas-specific features, convert at the end:Pattern 3: Mixed Workflow
Performance Comparison
DataStore is significantly faster for large datasets:| Operation | pandas | DataStore | Speedup |
|---|---|---|---|
| GroupBy count | 347ms | 17ms | 19.93x |
| Complex pipeline | 2,047ms | 380ms | 5.39x |
| Filter+Sort+Head | 1,537ms | 350ms | 4.40x |
| GroupBy agg | 406ms | 141ms | 2.88x |
Troubleshooting Migration
Issue: Operation Not Working
Some pandas operations may not be supported. Check:- Is the operation in the compatibility list?
- Try converting to pandas first:
ds.to_df().operation()
Issue: Different Results
Enable debug logging to understand what’s happening:Issue: Slow Performance
Check your execution pattern:Issue: Type Mismatches
DataStore may infer types differently:Gradual Migration Strategy
Week 1: Test Compatibility
Week 2: Switch Simple Scripts
Start with scripts that:- Read large files
- Do filtering and aggregation
- Don’t use custom apply functions