Strategy Comparison and Selection Guide
=======================================
This guide helps you choose the right strategy for your data processing needs by comparing the characteristics, use cases, and performance implications of each strategy.
Quick Strategy Selector
-----------------------
.. raw:: html
```rst
+------------------+----------------+--------------+---------------+------------------+---------------+
| **Use Case** | **DropCreate** | **Truncate** | **Append** | **Upsert** | **Historize** |
+==================+================+==============+===============+==================+===============+
| Full reload | ✅ | ✅ | ❌ | ❌ | ❌ |
+------------------+----------------+--------------+---------------+------------------+---------------+
| Incremental load | ❌ | ❌ | ✅ | ✅ | ✅ |
+------------------+----------------+--------------+---------------+------------------+---------------+
| Change tracking | ❌ | ❌ | ❌ | ❌ | ✅ |
+------------------+----------------+--------------+---------------+------------------+---------------+
| Schema evolution | ❌ | ✅ | ✅ | ✅ | ✅ |
+------------------+----------------+--------------+---------------+------------------+---------------+
| Duplicate | N/A | N/A | None | Update | Version |
| handling | | | | | |
+------------------+----------------+--------------+---------------+------------------+---------------+
| Performance | Fast | Fast | Fast | Medium | Slow |
+------------------+----------------+--------------+---------------+------------------+---------------+
| Storage overhead | Low | Low | Low | Low | High |
+------------------+----------------+--------------+---------------+------------------+---------------+
```
.. raw:: html
Detailed Strategy Comparison
----------------------------
Data Handling Characteristics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**DropCreate**
- **Target handling**: Completely recreates target (drop + create)
- **Source processing**: All source data treated as new
- **Existing data**: Permanently lost
- **Schema changes**: Target inherits source schema exactly
- **Metadata**: Uniform across all records
**Truncate**
- **Target handling**: Clears target data, preserves structure
- **Source processing**: All source data treated as new
- **Existing data**: Removed but structure preserved
- **Schema changes**: Applied before loading data
- **Metadata**: Uniform across all records
**Append**
- **Target handling**: Preserves all existing data
- **Source processing**: All source data added to target
- **Existing data**: Completely preserved
- **Schema changes**: Applied without affecting existing records
- **Metadata**: Mixed hashes (existing vs new data)
**Upsert**
- **Target handling**: Updates existing, inserts new based on keys
- **Source processing**: Key-based matching determines insert vs update
- **Existing data**: Updated if keys match, otherwise preserved
- **Schema changes**: Applied before key-based operations
- **Metadata**: Mixed hashes (updated vs unchanged records)
**Historize**
- **Target handling**: Maintains full history of changes over time
- **Source processing**: Creates new versions for changed records
- **Existing data**: Preserved with validity timestamps
- **Schema changes**: Applied while maintaining historical continuity
- **Metadata**: Complex versioning with validity periods
Performance Characteristics
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
:widths: 20 15 15 15 15 20
* - Strategy
- Write Speed
- Read Speed
- Storage
- Memory
- Complexity
* - DropCreate
- ⭐⭐⭐⭐⭐
- ⭐⭐⭐⭐⭐
- ⭐⭐⭐⭐⭐
- ⭐⭐⭐⭐
- ⭐
* - Truncate
- ⭐⭐⭐⭐⭐
- ⭐⭐⭐⭐⭐
- ⭐⭐⭐⭐⭐
- ⭐⭐⭐⭐
- ⭐⭐
* - Append
- ⭐⭐⭐⭐
- ⭐⭐⭐⭐
- ⭐⭐⭐⭐
- ⭐⭐⭐
- ⭐⭐
* - Upsert
- ⭐⭐⭐
- ⭐⭐⭐
- ⭐⭐⭐⭐
- ⭐⭐
- ⭐⭐⭐⭐
* - Historize
- ⭐⭐
- ⭐⭐
- ⭐⭐
- ⭐⭐
- ⭐⭐⭐⭐⭐
Use Case Decision Tree
----------------------
.. code-block:: text
Do you need to preserve existing data?
├── NO → Do you need to preserve target structure?
│ ├── NO → Use DropCreate (fastest, complete recreation)
│ └── YES → Use Truncate (fast, preserves schema)
└── YES → Do you have identifying keys for records?
├── NO → Use Append (simple addition, no deduplication)
└── YES → Do you need to track changes over time?
├── NO → Use Upsert (update existing, insert new)
└── YES → Use Historize (full change tracking)
Real-World Scenarios
--------------------
**ETL Data Warehouse Loading**
- **Daily fact table refresh**: DropCreate or Truncate
- **Incremental dimension updates**: Upsert
- **Event stream processing**: Append
- **Slowly changing dimensions**: Historize
**Data Lake Operations**
- **Raw data ingestion**: Append
- **Curated data updates**: Upsert
- **Historical data preservation**: Historize
- **Data quality corrections**: DropCreate
**Analytics and Reporting**
- **Dashboard data refresh**: Truncate
- **Audit trail maintenance**: Historize
- **Incremental aggregations**: Append or Upsert
- **Data mart population**: DropCreate
Strategy Selection Guidelines
-----------------------------
**Choose DropCreate when:**
- Target schema frequently changes
- Data quality issues require clean rebuilds
- Processing time is not critical
- Storage space is limited
- Simple, predictable operations are preferred
**Choose Truncate when:**
- Target schema is stable but data changes completely
- Fast reload performance is critical
- Schema evolution support is needed
- Existing target structure should be preserved
**Choose Append when:**
- All data is valuable and should be retained
- Source provides only new/incremental data
- Duplicate detection is handled upstream
- Simple growth patterns are acceptable
**Choose Upsert when:**
- Source contains mix of new and updated records
- Business keys can identify record relationships
- Current state accuracy is more important than history
- Moderate complexity is acceptable for accuracy
**Choose Historize when:**
- Regulatory compliance requires change tracking
- Business analysis needs historical trends
- Audit capabilities are essential
- Storage costs are less important than data completeness
- Complex temporal queries will be performed
Common Anti-Patterns
--------------------
**Avoid these combinations:**
- **Append for dimension tables**: Use Upsert instead to handle changes
- **DropCreate for large fact tables**: Use Truncate for better performance
- **Historize for high-frequency data**: Consider Append with custom timestamps
- **Upsert without proper keys**: Use Append to avoid incorrect matching
- **Truncate with frequent schema changes**: Use DropCreate for flexibility
Migration Strategies
--------------------
**When changing between strategies:**
1. **From DropCreate/Truncate to Append/Upsert**: Requires one-time historical data load
2. **From Append to Upsert**: Requires deduplication and key identification
3. **From Upsert to Historize**: Existing data becomes initial historical state
4. **From any strategy to DropCreate**: Always safe but loses historical data
Each migration path has specific considerations for data preservation and downtime requirements.