Skip to main content

What is Chunking?

Chunking splits large documents into smaller pieces for processing. This enables:
  • Memory-efficient processing of large documents
  • Better extraction accuracy by focusing on relevant sections
  • Parallel processing for faster results
  • Handling documents that exceed model context limits

When to Use Chunking

Use Chunking

  • Documents over 10 pages
  • Complex multi-section documents
  • Memory-intensive processing
  • Extracting data scattered across pages

Skip Chunking

  • Single-page documents
  • Simple 2-3 page documents
  • When data is on first page only
  • Speed is critical priority

Enabling Chunking

Set use_chunk: true and provide chunking configuration:
{
  "config": {
    "use_chunk": true,
    "chunking_config": {
      "strategy": "PAGE",
      "target_chunk_size": 2
    }
  }
}

Chunking Strategies

Available strategies depend on whether parser is enabled.

With Parser (use_parser: true)

All chunking strategies available:
Chunk by PagesGroups document into page-based chunks.
{
  "strategy": "PAGE",
  "target_chunk_size": 2  // 2 pages per chunk
}
When to use:
  • Document data organized by pages
  • Simple page-by-page processing
  • Invoices with line items per page
target_chunk_size: Number of pages per chunk (default: 1)

Without Parser (use_parser: false)

Only PAGE strategy is available when use_parser: false.
{
  "config": {
    "use_parser": false,
    "use_chunk": true,
    "chunking_config": {
      "strategy": "PAGE",
      "target_chunk_size": 1
    }
  }
}

target_chunk_size Parameter

The meaning of target_chunk_size varies by strategy:
Strategytarget_chunk_size MeaningDefault
PAGENumber of pages per chunk1
VARIABLETarget characters per chunk1000
SECTIONNumber of sections per chunk1
PAGE_SECTIONSChunking granularity1
BLOCKNumber of layout blocks per chunk1
Start with default values and adjust based on document complexity and extraction results.

Strategy Selection Guide

Recommended: PAGEProcess 2-3 pages at a time for invoices with line items across pages.
{
  "strategy": "PAGE",
  "target_chunk_size": 2
}
Recommended: SECTIONChunk by contract sections/clauses for better context.
{
  "strategy": "SECTION",
  "target_chunk_size": 2
}
Recommended: SECTION or PAGE_SECTIONSMaintain section context while respecting page boundaries.
{
  "strategy": "SECTION",
  "target_chunk_size": 1
}
Recommended: BLOCKGroup related form blocks together.
{
  "strategy": "BLOCK",
  "target_chunk_size": 5
}
Recommended: Disable chunkingSingle-page simple documents don’t need chunking.
{
  "use_chunk": false
}

Complete Configuration Examples

Template with Chunking

{
  "name": "Contract Extractor",
  "schema": {...},
  "config": {
    "use_parser": true,
    "parser_mode": "PRO",
    "use_chunk": true,
    "chunking_config": {
      "strategy": "SECTION",
      "target_chunk_size": 2
    }
  }
}

Request with Chunking Override

{
  "files": "https://example.com/long-contract.pdf",
  "template_id": "CONTRACT",
  "config": {
    "use_chunk": true,
    "chunking_config": {
      "strategy": "PAGE",
      "target_chunk_size": 3
    }
  }
}

Chunking without Parser

{
  "files": "https://example.com/document.pdf",
  "schema": {...},
  "config": {
    "use_parser": false,
    "use_chunk": true,
    "chunking_config": {
      "strategy": "PAGE",  // Only PAGE available
      "target_chunk_size": 1
    }
  }
}

Best Practices

Start Simple

Begin with PAGE strategy, adjust if needed

Match Document Structure

Choose strategy that aligns with document organization

Test Chunk Sizes

Experiment with target_chunk_size for optimal results

Monitor Results

Check extraction quality and adjust strategy
Very small chunk sizes may reduce accuracy by splitting related content. Very large chunks may exceed context limits.

Performance Considerations

StrategyProcessing SpeedMemory UsageAccuracy
PAGEFastLowGood
VARIABLEFastLowGood
SECTIONMediumMediumBetter
PAGE_SECTIONSMediumMediumBetter
BLOCKSlowerHigherBest

Next Steps