Skip to main content

Overview

Document Classification enables you to automatically organize and route documents based on their type and structure. This capability helps streamline document workflows by automatically identifying document categories and extracting specific sections from multi-part documents.
Document Classification is coming soon. This feature is currently in development and will be available in a future release. Contact [email protected] to be notified when Classification becomes available.

Classification Modes

Document Type Classification

Classify documents into predefined categories using custom rules (Coming Soon)

Document Splitting

Split multi-section documents into logical parts based on your rules (Coming Soon)

How It Works

Document Type Classification

Automatically categorize documents into predefined types based on rules you define.
1

Define Categories

Create a list of document types you need to identify (e.g., Invoice, Receipt, Contract, ID Card)
2

Configure Rules

Define classification rules that identify each document type based on content patterns, layout features, or specific text
3

Submit Document

Upload or reference a document for classification
4

Receive Classification

Get back the document type classification with confidence scores

Document Splitting

Split complex multi-section documents into separate logical parts.
1

Define Sections

Specify the section types you want to extract (e.g., Balance Sheet, Income Statement, Cash Flow Statement)
2

Create Splitting Rules

Define rules to identify section boundaries based on headers, page breaks, or content patterns
3

Submit Document

Upload the multi-section document for splitting
4

Receive Split Results

Get back page ranges for each identified section, with sections that don’t match any rules marked as unclassified

Use Cases

Invoice Processing

Classify incoming documents as Invoice, Purchase Order, Receipt, or Credit Note to route to appropriate accounting workflows

Identity Verification

Identify document types such as Passport, Driver’s License, or National ID to apply document-specific extraction templates

Financial Statements

Split comprehensive financial reports into Balance Sheet, Income Statement, and Cash Flow Statement for targeted analysis

Legal Contracts

Divide lengthy contracts into Preamble, Terms & Conditions, Signatures, and Appendices for efficient review

Healthcare Records

Sort medical documents into Lab Results, Prescriptions, Insurance Claims, and Referrals for proper patient record routing

Batch Scanning

Split bulk scanned documents into individual files based on separator pages or content detection

Key Capabilities

User-Defined Rules

Classification and splitting are based on rules you define, giving you precise control over how documents are categorized and divided. Rules can leverage:
  • Content patterns and specific text markers
  • Document layout and structure
  • Header and section formatting
  • Page boundaries and separators

Classification Output

When classifying documents, you’ll receive:
  • Identified document type from your predefined categories
  • Confidence score indicating classification certainty
  • Alternative classifications if multiple types match
  • Unclassified status for documents that don’t match any rules

Splitting Output

When splitting documents, you’ll receive:
  • Page ranges for each identified section
  • Section type labels based on your definitions
  • Unmatched pages that don’t fit any section criteria
  • Ability to process each section independently with targeted templates

Best Practices

When Classification becomes available, define clear, specific rules for each document type. Use distinctive features like headers, layouts, or specific text patterns to ensure accurate classification. Test rules with diverse document samples to handle variations in format and quality.
Prepare your classification taxonomy now by identifying the document types you need to distinguish and the criteria that differentiate them. A well-organized category structure will help you hit the ground running when the feature launches.
For document splitting, establish clear rules for where sections begin and end. Consider using multiple indicators (headers, page numbers, content patterns) to improve split accuracy, especially for documents with varying formats.
Plan to use Classification together with Templates for powerful workflows. Once documents are classified or split, apply type-specific or section-specific extraction templates to extract relevant data from each part.

Availability

Document Classification is currently under development. API endpoints and specific implementation details are not yet available. This documentation provides a preview of planned capabilities. Features and functionality may change before release.

Next Steps