Skip to main content

Schemas

A schema defines the structure of data you want extracted from documents. Each schema has a name, description, and a list of fields with types and validation rules.

Create a schema

curl -X POST http://localhost:4000/api/schema \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Invoice",
"description": "Standard invoice extraction",
"fields": [
{
"name": "vendor_name",
"type": "text",
"required": true,
"description": "Name of the vendor or supplier"
},
{
"name": "line_items",
"type": "object",
"required": false,
"isArray": true,
"description": "Individual line items",
"properties": [
{ "name": "description", "type": "text", "required": true },
{ "name": "amount", "type": "number", "required": true }
]
}
]
}'

Field types

TypeDescription
textString values
numberNumeric values
dateDate strings
booleanTrue/false
objectNested structure with properties

Fields can be arrays via "isArray": true.

Schema versions

Each update creates a new schema version. Jobs reference a specific version so extractions remain reproducible. List versions:

curl http://localhost:4000/api/schema/YOUR_SCHEMA_ID/versions \
-H "Authorization: Bearer YOUR_TOKEN"

AI schema generation

Generate a schema from a sample document or prompt:

curl -X POST http://localhost:4000/api/schema/ai \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@sample-invoice.pdf" \
-F "mode=document_only" \
-F "schemaName=Invoice" \
-F "userPrompt=Extract all invoice fields including tax"

Modes:

  • document_only — Infer fields from the uploaded document
  • prompt_only — Generate from text description only
  • document_with_prompt — Combine document and prompt

Allowed values and fuzzy matching

Fields can restrict output to a list of allowed values with optional fuzzy matching:

{
"name": "category",
"type": "text",
"required": true,
"possibleValues": [
{ "id": "cat-1", "label": "Office Supplies" },
{ "id": "cat-2", "label": "Travel" }
],
"fuzzyMatchOptions": {
"keepUnmatched": true,
"returnId": true
}
}

Allowed values can also be loaded from a remote source. Test a source:

curl -X POST http://localhost:4000/api/schema/allowed-values-source/test \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/categories.json"}'

Webhook binding

Schemas can reference webhook endpoints via webhookEndpointIds (max 20). When set, only those endpoints receive events for jobs using this schema.

curl http://localhost:4000/api/schema/YOUR_SCHEMA_ID/webhooks \
-H "Authorization: Bearer YOUR_TOKEN"
  • Non-empty list: only bound endpoints are notified
  • Empty list (default): all user webhooks apply

Manage schemas

MethodPathDescription
GET/api/schemaList schemas
GET/api/schema/:idGet schema
PUT/api/schema/:idUpdate schema (creates new version)
DELETE/api/schema/:idDelete schema

Schema conversion (public)

Convert legacy schema formats:

curl -X POST http://localhost:4000/api/schema/convert \
-H "Content-Type: application/json" \
-d '{"fields": [...]}'

This endpoint is public and does not require authentication.