Strategy Comparison and Selection Guide

This guide helps you choose the right strategy for your data processing needs by comparing the characteristics, use cases, and performance implications of each strategy.

Quick Strategy Selector

`rst +------------------+----------------+--------------+---------------+------------------+---------------+ | **Use Case**     | **DropCreate** | **Truncate** | **Append**    | **Upsert**       | **Historize** | +==================+================+==============+===============+==================+===============+ | Full reload      |              |            |             |                |             | +------------------+----------------+--------------+---------------+------------------+---------------+ | Incremental load |              |            |             |                |             | +------------------+----------------+--------------+---------------+------------------+---------------+ | Change tracking  |              |            |             |                |             | +------------------+----------------+--------------+---------------+------------------+---------------+ | Schema evolution |              |            |             |                |             | +------------------+----------------+--------------+---------------+------------------+---------------+ | Duplicate        | N/A            | N/A          | None          | Update           | Version       | | handling         |                |              |               |                  |               | +------------------+----------------+--------------+---------------+------------------+---------------+ | Performance      | Fast           | Fast         | Fast          | Medium           | Slow          | +------------------+----------------+--------------+---------------+------------------+---------------+ | Storage overhead | Low            | Low          | Low           | Low              | High          | +------------------+----------------+--------------+---------------+------------------+---------------+ ` .. raw:: html

</div>

Detailed Strategy Comparison

Data Handling Characteristics

DropCreate
  • Target handling: Completely recreates target (drop + create)

  • Source processing: All source data treated as new

  • Existing data: Permanently lost

  • Schema changes: Target inherits source schema exactly

  • Metadata: Uniform across all records

Truncate
  • Target handling: Clears target data, preserves structure

  • Source processing: All source data treated as new

  • Existing data: Removed but structure preserved

  • Schema changes: Applied before loading data

  • Metadata: Uniform across all records

Append
  • Target handling: Preserves all existing data

  • Source processing: All source data added to target

  • Existing data: Completely preserved

  • Schema changes: Applied without affecting existing records

  • Metadata: Mixed hashes (existing vs new data)

Upsert
  • Target handling: Updates existing, inserts new based on keys

  • Source processing: Key-based matching determines insert vs update

  • Existing data: Updated if keys match, otherwise preserved

  • Schema changes: Applied before key-based operations

  • Metadata: Mixed hashes (updated vs unchanged records)

Historize
  • Target handling: Maintains full history of changes over time

  • Source processing: Creates new versions for changed records

  • Existing data: Preserved with validity timestamps

  • Schema changes: Applied while maintaining historical continuity

  • Metadata: Complex versioning with validity periods

Performance Characteristics

Strategy

Write Speed

Read Speed

Storage

Memory

Complexity

DropCreate

⭐⭐⭐⭐⭐

⭐⭐⭐⭐⭐

⭐⭐⭐⭐⭐

⭐⭐⭐⭐

Truncate

⭐⭐⭐⭐⭐

⭐⭐⭐⭐⭐

⭐⭐⭐⭐⭐

⭐⭐⭐⭐

⭐⭐

Append

⭐⭐⭐⭐

⭐⭐⭐⭐

⭐⭐⭐⭐

⭐⭐⭐

⭐⭐

Upsert

⭐⭐⭐

⭐⭐⭐

⭐⭐⭐⭐

⭐⭐

⭐⭐⭐⭐

Historize

⭐⭐

⭐⭐

⭐⭐

⭐⭐

⭐⭐⭐⭐⭐

Use Case Decision Tree

Do you need to preserve existing data?
├── NO → Do you need to preserve target structure?
│   ├── NO → Use DropCreate (fastest, complete recreation)
│   └── YES → Use Truncate (fast, preserves schema)
└── YES → Do you have identifying keys for records?
    ├── NO → Use Append (simple addition, no deduplication)
    └── YES → Do you need to track changes over time?
        ├── NO → Use Upsert (update existing, insert new)
        └── YES → Use Historize (full change tracking)

Real-World Scenarios

ETL Data Warehouse Loading
  • Daily fact table refresh: DropCreate or Truncate

  • Incremental dimension updates: Upsert

  • Event stream processing: Append

  • Slowly changing dimensions: Historize

Data Lake Operations
  • Raw data ingestion: Append

  • Curated data updates: Upsert

  • Historical data preservation: Historize

  • Data quality corrections: DropCreate

Analytics and Reporting
  • Dashboard data refresh: Truncate

  • Audit trail maintenance: Historize

  • Incremental aggregations: Append or Upsert

  • Data mart population: DropCreate

Strategy Selection Guidelines

Choose DropCreate when: - Target schema frequently changes - Data quality issues require clean rebuilds - Processing time is not critical - Storage space is limited - Simple, predictable operations are preferred

Choose Truncate when: - Target schema is stable but data changes completely - Fast reload performance is critical - Schema evolution support is needed - Existing target structure should be preserved

Choose Append when: - All data is valuable and should be retained - Source provides only new/incremental data - Duplicate detection is handled upstream - Simple growth patterns are acceptable

Choose Upsert when: - Source contains mix of new and updated records - Business keys can identify record relationships - Current state accuracy is more important than history - Moderate complexity is acceptable for accuracy

Choose Historize when: - Regulatory compliance requires change tracking - Business analysis needs historical trends - Audit capabilities are essential - Storage costs are less important than data completeness - Complex temporal queries will be performed

Common Anti-Patterns

Avoid these combinations:

  • Append for dimension tables: Use Upsert instead to handle changes

  • DropCreate for large fact tables: Use Truncate for better performance

  • Historize for high-frequency data: Consider Append with custom timestamps

  • Upsert without proper keys: Use Append to avoid incorrect matching

  • Truncate with frequent schema changes: Use DropCreate for flexibility

Migration Strategies

When changing between strategies:

  1. From DropCreate/Truncate to Append/Upsert: Requires one-time historical data load

  2. From Append to Upsert: Requires deduplication and key identification

  3. From Upsert to Historize: Existing data becomes initial historical state

  4. From any strategy to DropCreate: Always safe but loses historical data

Each migration path has specific considerations for data preservation and downtime requirements.