Skip to main content

What are Templates?

Templates are reusable extraction schemas that define what data to extract from specific document types. Instead of providing a schema with each request, you can create templates once and reference them by ID.

Benefits of Templates

Reusability

Define once, use many times across all your documents

Consistency

Ensure consistent extraction across all documents of the same type

Versioning

Track changes and maintain multiple versions

Team Collaboration

Share templates across your organization

Creating a Template

Templates require:
  • Name: Descriptive identifier for the template
  • Schema: JSON Schema defining fields to extract
  • Description: Purpose and use case
Optional fields:
  • Custom ID: Readable identifier instead of UUID
  • Instruction: Additional guidance for the AI
  • Config: Advanced processing settings (parser, chunking, citations)
For complete API details and request/response examples, see the Template API Reference.

Using Templates

Once created, reference templates by ID or custom_id in your predict requests instead of providing a schema. Using UUID:
  • template_id: "550e8400-e29b-41d4-a716-446655440000"
Using custom ID:
  • template_id: "INVOICE" (more readable)
For complete request/response examples, see the Predict Async API Reference.

Managing Templates

Templates can be listed, retrieved, updated, and deleted through the API. Available operations:
  • List Templates - Get paginated list with optional search/filtering
  • Get Template - Retrieve full template details including schema
  • Update Template - Modify name, description, schema, or config
  • Delete Template - Soft delete (can be restored)
For complete API details, see the Template Management API Reference.

Auto-Schema Generation

Generate JSON schemas automatically from natural language descriptions. Describe what fields you want to extract, and the API will create a properly formatted schema. Example input: “Extract invoice number, date, vendor name, total amount, and line items with description and quantity” Output: Properly formatted JSON Schema ready to use in template creation
For complete API details, see the Auto-Schema Generation API Reference.

Template Configuration Options

Templates can include configuration to control processing behavior. These settings become defaults for all documents processed with the template.
For all configuration options, examples, and detailed explanations, see the Configuration Overview.

Schema Design Best Practices

Every property must include both type and description. Use clear, specific field names and descriptions.Good Example:
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "Unique invoice identifier"
    },
    "invoice_date": {
      "type": "string",
      "description": "Date when invoice was issued"
    },
    "total_amount": {
      "type": "number",
      "description": "Total invoice amount including tax"
    }
  }
}
Bad Example:
{
  "type": "object",
  "properties": {
    "num": {
      "type": "string",
      "description": "Number"
    },
    "date": {
      "type": "string",
      "description": "Date"
    },
    "amt": {
      "type": "number",
      "description": "Amount"
    }
  }
}
Clear field names and descriptions help the AI understand what to extract and improve extraction accuracy.
Use appropriate JSON Schema data types for your fields:Available Types:
  • string - Text values (names, addresses, IDs, dates as ISO strings)
  • number - Numeric values including decimals (prices, percentages, measurements)
  • integer - Whole numbers only (counts, quantities, IDs)
  • boolean - True/false values (flags, status indicators)
  • array - Lists of items (line items, transactions, tags)
  • object - Nested structures (addresses, contact info, metadata)
Constraints:
  • enum - Restrict values to a specific set of allowed options
Example:
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string"
    },
    "total": {
      "type": "number"
    },
    "quantity": {
      "type": "integer"
    },
    "is_paid": {
      "type": "boolean"
    },
    "status": {
      "type": "string",
      "enum": ["pending", "approved", "rejected"]
    },
    "priority": {
      "type": "string",
      "enum": ["low", "medium", "high"]
    },
    "issue_date": {
      "type": "string"
    },
    "line_items": {
      "type": "array"
    }
  }
}
Use enum for fields with a fixed set of possible values like status codes, categories, or priority levels. This helps improve extraction accuracy for predefined options.
Schemas support nested objects and arrays up to 3 levels deep.Example (3 levels):
{
  "type": "object",
  "properties": {
    "vendor": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "address": {
          "type": "object",
          "properties": {
            "street": {
              "type": "string"
            },
            "city": {
              "type": "string"
            }
          }
        }
      }
    },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "product": {
            "type": "string"
          },
          "quantity": {
            "type": "number"
          }
        }
      }
    }
  }
}
Maximum nesting depth is 3 levels. Deeper nesting may result in extraction errors or incomplete data.
All fields must be listed in the required array. To make a field optional (nullable), use anyOf to allow both the field type and null.
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string"
    },
    "total": {
      "type": "number"
    },
    "notes": {
      "anyOf": [
        { "type": "string" },
        { "type": "null" }
      ]
    },
    "discount": {
      "anyOf": [
        { "type": "number" },
        { "type": "null" }
      ]
    }
  },
  "required": ["invoice_number", "total", "notes", "discount"]
}
In this example:
  • All fields are in the required array (OpenAI requirement)
  • invoice_number and total must have values
  • notes and discount can be null if not found in the document
All properties must be included in the required array. Use anyOf with null type for fields that may not always be present.

Template Example

This example demonstrates all schema best practices in one comprehensive template:
{
  "name": "Invoice Extractor",
  "description": "Extract invoice data with vendor details and line items",
  "custom_id": "INVOICE_V1",
  "schema": {
    "type": "object",
    "properties": {
      "invoice_number": {
        "type": "string",
        "description": "Unique invoice identifier"
      },
      "invoice_date": {
        "type": "string",
        "description": "Date when invoice was issued"
      },
      "status": {
        "type": "string",
        "enum": ["pending", "paid", "overdue"],
        "description": "Current payment status"
      },
      "vendor": {
        "type": "object",
        "description": "Vendor/seller information",
        "properties": {
          "name": {
            "type": "string",
            "description": "Vendor company name"
          },
          "tax_id": {
            "type": "string",
            "description": "Vendor tax identification number"
          }
        },
        "required": ["name", "tax_id"]
      },
      "line_items": {
        "type": "array",
        "description": "List of items or services",
        "items": {
          "type": "object",
          "properties": {
            "description": {
              "type": "string",
              "description": "Item or service description"
            },
            "quantity": {
              "type": "integer",
              "description": "Number of units"
            },
            "unit_price": {
              "type": "number",
              "description": "Price per unit"
            }
          },
          "required": ["description", "quantity", "unit_price"]
        }
      },
      "subtotal": {
        "type": "number",
        "description": "Total before tax"
      },
      "tax": {
        "type": "number",
        "description": "Tax amount"
      },
      "total": {
        "type": "number",
        "description": "Total amount including tax"
      },
      "notes": {
        "anyOf": [
          { "type": "string" },
          { "type": "null" }
        ],
        "description": "Additional notes or comments"
      },
      "is_paid": {
        "type": "boolean",
        "description": "Whether invoice has been paid"
      }
    },
    "required": [
      "invoice_number",
      "invoice_date",
      "status",
      "vendor",
      "line_items",
      "subtotal",
      "tax",
      "total",
      "notes",
      "is_paid"
    ]
  }
}
This example demonstrates:
  • ✅ Descriptive field names with descriptions for every property
  • ✅ Multiple data types: string, number, integer, boolean, array, object
  • ✅ Enum constraint for status field with predefined values
  • ✅ Nested objects (vendor) and arrays (line_items) - 2 levels deep
  • ✅ Optional/nullable field (notes) using anyOf with null
  • ✅ All properties in required array (OpenAI requirement)
  • ✅ Custom ID for easy reference instead of UUID

Versioning Templates

Maintain multiple versions of templates:
  1. Use Custom IDs with Versions
    invoice_v1
    invoice_v2
    invoice_v3
    
  2. Include Version in Name
    Invoice Extractor v1.0
    Invoice Extractor v2.0
    
  3. Update Description with Changelog
    "description": "Invoice extractor v2 - Added tax breakdown and discount fields"
    

Next Steps