Strategy Comparison and Selection Guide ======================================= This guide helps you choose the right strategy for your data processing needs by comparing the characteristics, use cases, and performance implications of each strategy. Quick Strategy Selector ----------------------- .. raw:: html
```rst +------------------+----------------+--------------+---------------+------------------+---------------+ | **Use Case** | **DropCreate** | **Truncate** | **Append** | **Upsert** | **Historize** | +==================+================+==============+===============+==================+===============+ | Full reload | ✅ | ✅ | ❌ | ❌ | ❌ | +------------------+----------------+--------------+---------------+------------------+---------------+ | Incremental load | ❌ | ❌ | ✅ | ✅ | ✅ | +------------------+----------------+--------------+---------------+------------------+---------------+ | Change tracking | ❌ | ❌ | ❌ | ❌ | ✅ | +------------------+----------------+--------------+---------------+------------------+---------------+ | Schema evolution | ❌ | ✅ | ✅ | ✅ | ✅ | +------------------+----------------+--------------+---------------+------------------+---------------+ | Duplicate | N/A | N/A | None | Update | Version | | handling | | | | | | +------------------+----------------+--------------+---------------+------------------+---------------+ | Performance | Fast | Fast | Fast | Medium | Slow | +------------------+----------------+--------------+---------------+------------------+---------------+ | Storage overhead | Low | Low | Low | Low | High | +------------------+----------------+--------------+---------------+------------------+---------------+ ``` .. raw:: html
Detailed Strategy Comparison ---------------------------- Data Handling Characteristics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **DropCreate** - **Target handling**: Completely recreates target (drop + create) - **Source processing**: All source data treated as new - **Existing data**: Permanently lost - **Schema changes**: Target inherits source schema exactly - **Metadata**: Uniform across all records **Truncate** - **Target handling**: Clears target data, preserves structure - **Source processing**: All source data treated as new - **Existing data**: Removed but structure preserved - **Schema changes**: Applied before loading data - **Metadata**: Uniform across all records **Append** - **Target handling**: Preserves all existing data - **Source processing**: All source data added to target - **Existing data**: Completely preserved - **Schema changes**: Applied without affecting existing records - **Metadata**: Mixed hashes (existing vs new data) **Upsert** - **Target handling**: Updates existing, inserts new based on keys - **Source processing**: Key-based matching determines insert vs update - **Existing data**: Updated if keys match, otherwise preserved - **Schema changes**: Applied before key-based operations - **Metadata**: Mixed hashes (updated vs unchanged records) **Historize** - **Target handling**: Maintains full history of changes over time - **Source processing**: Creates new versions for changed records - **Existing data**: Preserved with validity timestamps - **Schema changes**: Applied while maintaining historical continuity - **Metadata**: Complex versioning with validity periods Performance Characteristics ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 15 15 15 15 20 * - Strategy - Write Speed - Read Speed - Storage - Memory - Complexity * - DropCreate - ⭐⭐⭐⭐⭐ - ⭐⭐⭐⭐⭐ - ⭐⭐⭐⭐⭐ - ⭐⭐⭐⭐ - ⭐ * - Truncate - ⭐⭐⭐⭐⭐ - ⭐⭐⭐⭐⭐ - ⭐⭐⭐⭐⭐ - ⭐⭐⭐⭐ - ⭐⭐ * - Append - ⭐⭐⭐⭐ - ⭐⭐⭐⭐ - ⭐⭐⭐⭐ - ⭐⭐⭐ - ⭐⭐ * - Upsert - ⭐⭐⭐ - ⭐⭐⭐ - ⭐⭐⭐⭐ - ⭐⭐ - ⭐⭐⭐⭐ * - Historize - ⭐⭐ - ⭐⭐ - ⭐⭐ - ⭐⭐ - ⭐⭐⭐⭐⭐ Use Case Decision Tree ---------------------- .. code-block:: text Do you need to preserve existing data? ├── NO → Do you need to preserve target structure? │ ├── NO → Use DropCreate (fastest, complete recreation) │ └── YES → Use Truncate (fast, preserves schema) └── YES → Do you have identifying keys for records? ├── NO → Use Append (simple addition, no deduplication) └── YES → Do you need to track changes over time? ├── NO → Use Upsert (update existing, insert new) └── YES → Use Historize (full change tracking) Real-World Scenarios -------------------- **ETL Data Warehouse Loading** - **Daily fact table refresh**: DropCreate or Truncate - **Incremental dimension updates**: Upsert - **Event stream processing**: Append - **Slowly changing dimensions**: Historize **Data Lake Operations** - **Raw data ingestion**: Append - **Curated data updates**: Upsert - **Historical data preservation**: Historize - **Data quality corrections**: DropCreate **Analytics and Reporting** - **Dashboard data refresh**: Truncate - **Audit trail maintenance**: Historize - **Incremental aggregations**: Append or Upsert - **Data mart population**: DropCreate Strategy Selection Guidelines ----------------------------- **Choose DropCreate when:** - Target schema frequently changes - Data quality issues require clean rebuilds - Processing time is not critical - Storage space is limited - Simple, predictable operations are preferred **Choose Truncate when:** - Target schema is stable but data changes completely - Fast reload performance is critical - Schema evolution support is needed - Existing target structure should be preserved **Choose Append when:** - All data is valuable and should be retained - Source provides only new/incremental data - Duplicate detection is handled upstream - Simple growth patterns are acceptable **Choose Upsert when:** - Source contains mix of new and updated records - Business keys can identify record relationships - Current state accuracy is more important than history - Moderate complexity is acceptable for accuracy **Choose Historize when:** - Regulatory compliance requires change tracking - Business analysis needs historical trends - Audit capabilities are essential - Storage costs are less important than data completeness - Complex temporal queries will be performed Common Anti-Patterns -------------------- **Avoid these combinations:** - **Append for dimension tables**: Use Upsert instead to handle changes - **DropCreate for large fact tables**: Use Truncate for better performance - **Historize for high-frequency data**: Consider Append with custom timestamps - **Upsert without proper keys**: Use Append to avoid incorrect matching - **Truncate with frequent schema changes**: Use DropCreate for flexibility Migration Strategies -------------------- **When changing between strategies:** 1. **From DropCreate/Truncate to Append/Upsert**: Requires one-time historical data load 2. **From Append to Upsert**: Requires deduplication and key identification 3. **From Upsert to Historize**: Existing data becomes initial historical state 4. **From any strategy to DropCreate**: Always safe but loses historical data Each migration path has specific considerations for data preservation and downtime requirements.