Skip to main content

What is the Parser?

The parser extracts document structure (blocks, layout) before AI extraction. This structured representation helps the AI understand document context and relationships between elements. Parser Output Includes:
  • Text blocks with coordinates
  • Layout structure (headers, footers, tables)
  • Reading order
  • Document hierarchy

Parser Toggle

use_parser

Controls whether to parse document structure before extraction.
When to use:
  • Multi-page documents
  • Complex layouts (tables, forms)
  • Documents requiring high accuracy
  • When you need citations
  • When using advanced chunking strategies
Benefits:
  • Higher extraction accuracy
  • Better context understanding
  • Supports all chunking strategies
  • Enables citations
  • Handles complex layouts better
Trade-offs:
  • Slower processing
  • Higher cost per document
{
  "config": {
    "use_parser": true,
    "parser_mode": "PLUS"
  }
}

Parser Modes

When use_parser: true, you can select the parser quality level with parser_mode.

Available Modes

LITE

Fastest, lowest cost (Default)Basic parsing for simple documents
  • Citations automatically disabled
  • Good for receipts, simple forms

PLUS

BalancedGood accuracy for most documents
  • Handles tables and forms well
  • Best for general use

PRO

Highest accuracyBest quality for complex documents
  • Advanced layout analysis
  • Best for contracts, complex invoices

Mode Comparison

FeatureLITEPLUSPRO
SpeedFastestFastSlower
CostLowestMediumHighest
AccuracyGoodBetterBest
Citations❌ Disabled✅ Supported✅ Supported
TablesBasicGoodExcellent
Complex LayoutsBasicGoodExcellent
Best ForSimple docsGeneral useComplex docs

Selecting Parser Mode

{
  "config": {
    "use_parser": true,
    "parser_mode": "PLUS"  // LITE, PLUS, or PRO
  }
}
If not specified, parser_mode defaults to LITE.

Impact on Other Features

Citations

Citations require use_parser: true. Additionally, citations are automatically disabled when parser_mode: "LITE".
// ✅ Citations will work
{
  "use_parser": true,
  "parser_mode": "PLUS",  // or PRO
  "enable_citations": true
}

// ❌ Citations disabled (LITE mode)
{
  "use_parser": true,
  "parser_mode": "LITE",
  "enable_citations": true  // Ignored
}

// ❌ Citations not supported
{
  "use_parser": false,
  "enable_citations": true  // Ignored
}
For citation details, see the Citations Configuration.

Chunking Strategies

Parser mode affects available chunking strategies: With Parser (use_parser: true):
  • ✅ VARIABLE - Dynamic chunking by character size
  • ✅ SECTION - Chunk by document sections
  • ✅ PAGE - Chunk by pages
  • ✅ PAGE_SECTIONS - Hybrid page and section chunking
  • ✅ BLOCK - Chunk by layout blocks
Without Parser (use_parser: false):
  • ✅ PAGE - Only strategy available
For chunking details, see the Chunking Configuration.

Template Configuration

Set parser mode in templates:
{
  "name": "Invoice Extractor",
  "schema": {...},
  "config": {
    "use_parser": true,
    "parser_mode": "PLUS"
  }
}

Request Level (Override Template)

{
  "files": "https://example.com/document.pdf",
  "template_id": "INVOICE",
  "config": {
    "parser_mode": "PRO"  // Override template setting
  }
}

Next Steps