Skip to main content

What are Citations?

Citations track the source location of each extracted field in your document. Each extracted value includes references to the specific document blocks where the information was found. Benefits:
  • Verification - Validate extracted data against source
  • Debugging - Identify why certain values were extracted
  • Audit Trail - Document evidence for compliance
  • Confidence Building - Show users where data came from

Enabling Citations

Set enable_citations: true in your configuration:
{
  "config": {
    "use_parser": true,
    "enable_citations": true
  }
}
Citations require use_parser: true. Additionally, citations are automatically disabled when parser_mode: "LITE".

Citation Format

When citations are enabled, the output format changes to include citation references.
Default format - direct values:
{
  "extraction": {
    "invoice_number": "INV-2024-001",
    "date": "2024-01-15",
    "total": 1500.00,
    "vendor": {
      "name": "Acme Corp",
      "address": "123 Main St"
    }
  }
}

Citation References

Citations use a reference format to point to document elements: Citation Implementation Example

Block References (b.X)

Format: b.{block_number} Points to a specific text block in the parsed document.
{
  "value": "INV-2024-001",
  "citations": ["b.1", "b.3"]  // Found in blocks 1 and 3
}
What is a block?
  • A text block is a logical chunk of content identified by the parser
  • Examples: paragraph, table cell, header, footer, line item
  • Block numbers start from 0 and increment through the document

Requirements and Limitations

Parser Required

Citations only work when use_parser: true. Direct vision extraction (use_parser: false) does not support citations.
// ✅ Citations will work
{
  "use_parser": true,
  "parser_mode": "PLUS",
  "enable_citations": true
}

// ❌ Citations ignored
{
  "use_parser": false,
  "enable_citations": true  // Ignored - parser required
}

Parser Mode Restriction

Citations are automatically disabled when using parser_mode: "LITE":
// ❌ Citations disabled in LITE mode
{
  "use_parser": true,
  "parser_mode": "LITE",
  "enable_citations": true  // Ignored - LITE mode disables citations
}

// ✅ Citations work in PLUS/PRO modes
{
  "use_parser": true,
  "parser_mode": "PLUS",  // or "PRO"
  "enable_citations": true
}
For parser mode details, see Parser Configuration.

Using Citations

Verifying Extracted Data

Citations help you verify extraction accuracy by showing source locations:
  1. Get extraction with citations
  2. Review low-confidence fields - Check citation blocks
  3. Compare source blocks - Validate against original document
  4. Identify issues - Debug incorrect extractions

Citation + Confidence Scores

Combine citations with confidence scores for robust validation:
{
  "extraction": {
    "invoice_number": {
      "value": "INV-2024-001",
      "citations": ["b.1"]
    }
  },
  "confidences": {
    "fields": {
      "invoice_number": 95.5  // High confidence + citation = verified
    }
  }
}
Validation Strategy:
  • High confidence (>80) + citations - Likely accurate, spot check
  • Medium confidence (60-80) + citations - Review cited blocks
  • Low confidence (<60) + citations - Manual verification needed
For confidence score details, see Document Processing - Confidence Scores.

Configuration Examples

Template with Citations

{
  "name": "Invoice Extractor",
  "schema": {...},
  "config": {
    "use_parser": true,
    "parser_mode": "PLUS",
    "enable_citations": true
  }
}

Request with Citations

{
  "files": "https://example.com/invoice.pdf",
  "template_id": "INVOICE",
  "config": {
    "enable_citations": true
  }
}

Disabling Citations (Override Template)

{
  "files": "https://example.com/simple-receipt.pdf",
  "template_id": "INVOICE",
  "config": {
    "enable_citations": false  // Disable for faster processing
  }
}

Array Fields with Citations

Citations work with array fields too:
{
  "extraction": {
    "line_items": [
      {
        "description": {
          "value": "Widget A",
          "citations": ["b.10"]
        },
        "quantity": {
          "value": 5,
          "citations": ["b.10"]
        },
        "price": {
          "value": 100.00,
          "citations": ["b.10", "b.11"]
        }
      },
      {
        "description": {
          "value": "Widget B",
          "citations": ["b.12"]
        },
        "quantity": {
          "value": 3,
          "citations": ["b.12"]
        },
        "price": {
          "value": 200.00,
          "citations": ["b.12"]
        }
      }
    ]
  }
}

When to Use Citations

  • Compliance and audit requirements
  • High-value financial documents
  • Legal documents requiring verification
  • Debugging extraction issues
  • Building user-facing verification UI
  • Manual review workflows
  • High-volume simple documents (receipts, ID cards)
  • Speed and cost are priorities
  • Source verification not needed
  • Using parser_mode LITE for performance
  • Trusted document sources with high confidence

Best Practices

Enable for Verification

Use citations for documents requiring manual review

Combine with Confidence

Check citations for low-confidence fields first

Use PLUS or PRO Mode

Citations require parser_mode PLUS or PRO

Consider Performance

Disable citations for high-volume simple docs
Enable citations during development and testing, then decide per document type for production based on verification needs vs. performance requirements.

Next Steps