What is Chunking?
Chunking splits large documents into smaller pieces for processing. This enables:- Memory-efficient processing of large documents
- Better extraction accuracy by focusing on relevant sections
- Parallel processing for faster results
- Handling documents that exceed model context limits
When to Use Chunking
Use Chunking
- Documents over 10 pages
- Complex multi-section documents
- Memory-intensive processing
- Extracting data scattered across pages
Skip Chunking
- Single-page documents
- Simple 2-3 page documents
- When data is on first page only
- Speed is critical priority
Enabling Chunking
Setuse_chunk: true and provide chunking configuration:
Chunking Strategies
Available strategies depend on whether parser is enabled.With Parser (use_parser: true)
All chunking strategies available:
- PAGE
- VARIABLE
- SECTION
- PAGE_SECTIONS
- BLOCK
Chunk by PagesGroups document into page-based chunks.When to use:
- Document data organized by pages
- Simple page-by-page processing
- Invoices with line items per page
Without Parser (use_parser: false)
target_chunk_size Parameter
The meaning oftarget_chunk_size varies by strategy:
| Strategy | target_chunk_size Meaning | Default |
|---|---|---|
| PAGE | Number of pages per chunk | 1 |
| VARIABLE | Target characters per chunk | 1000 |
| SECTION | Number of sections per chunk | 1 |
| PAGE_SECTIONS | Chunking granularity | 1 |
| BLOCK | Number of layout blocks per chunk | 1 |
Strategy Selection Guide
Multi-page Invoices
Multi-page Invoices
Recommended: PAGEProcess 2-3 pages at a time for invoices with line items across pages.
Long Contracts
Long Contracts
Recommended: SECTIONChunk by contract sections/clauses for better context.
Reports and Documents with Chapters
Reports and Documents with Chapters
Recommended: SECTION or PAGE_SECTIONSMaintain section context while respecting page boundaries.
Forms with Mixed Content
Forms with Mixed Content
Recommended: BLOCKGroup related form blocks together.
ID Cards, Receipts (No Chunking)
ID Cards, Receipts (No Chunking)
Recommended: Disable chunkingSingle-page simple documents don’t need chunking.
Complete Configuration Examples
Template with Chunking
Request with Chunking Override
Chunking without Parser
Best Practices
Start Simple
Begin with PAGE strategy, adjust if needed
Match Document Structure
Choose strategy that aligns with document organization
Test Chunk Sizes
Experiment with target_chunk_size for optimal results
Monitor Results
Check extraction quality and adjust strategy
Performance Considerations
| Strategy | Processing Speed | Memory Usage | Accuracy |
|---|---|---|---|
| PAGE | Fast | Low | Good |
| VARIABLE | Fast | Low | Good |
| SECTION | Medium | Medium | Better |
| PAGE_SECTIONS | Medium | Medium | Better |
| BLOCK | Slower | Higher | Best |