Schemas
A schema defines the structure of data you want extracted from documents. Each schema has a name, description, and a list of fields with types and validation rules.
Create a schema
curl -X POST http://localhost:4000/api/schema \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Invoice",
"description": "Standard invoice extraction",
"fields": [
{
"name": "vendor_name",
"type": "text",
"required": true,
"description": "Name of the vendor or supplier"
},
{
"name": "line_items",
"type": "object",
"required": false,
"isArray": true,
"description": "Individual line items",
"properties": [
{ "name": "description", "type": "text", "required": true },
{ "name": "amount", "type": "number", "required": true }
]
}
]
}'
Field types
| Type | Description |
|---|---|
text | String values |
number | Numeric values |
date | Date strings |
boolean | True/false |
object | Nested structure with properties |
Fields can be arrays via "isArray": true.
Schema versions
Each update creates a new schema version. Jobs reference a specific version so extractions remain reproducible. List versions:
curl http://localhost:4000/api/schema/YOUR_SCHEMA_ID/versions \
-H "Authorization: Bearer YOUR_TOKEN"
AI schema generation
Generate a schema from a sample document or prompt:
curl -X POST http://localhost:4000/api/schema/ai \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@sample-invoice.pdf" \
-F "mode=document_only" \
-F "schemaName=Invoice" \
-F "userPrompt=Extract all invoice fields including tax"
Modes:
document_only— Infer fields from the uploaded documentprompt_only— Generate from text description onlydocument_with_prompt— Combine document and prompt
Allowed values and fuzzy matching
Fields can restrict output to a list of allowed values with optional fuzzy matching:
{
"name": "category",
"type": "text",
"required": true,
"possibleValues": [
{ "id": "cat-1", "label": "Office Supplies" },
{ "id": "cat-2", "label": "Travel" }
],
"fuzzyMatchOptions": {
"keepUnmatched": true,
"returnId": true
}
}
Allowed values can also be loaded from a remote source. Test a source:
curl -X POST http://localhost:4000/api/schema/allowed-values-source/test \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/categories.json"}'
Webhook binding
Schemas can reference webhook endpoints via webhookEndpointIds (max 20). When set, only those endpoints receive events for jobs using this schema.
curl http://localhost:4000/api/schema/YOUR_SCHEMA_ID/webhooks \
-H "Authorization: Bearer YOUR_TOKEN"
- Non-empty list: only bound endpoints are notified
- Empty list (default): all user webhooks apply
Manage schemas
| Method | Path | Description |
|---|---|---|
GET | /api/schema | List schemas |
GET | /api/schema/:id | Get schema |
PUT | /api/schema/:id | Update schema (creates new version) |
DELETE | /api/schema/:id | Delete schema |
Schema conversion (public)
Convert legacy schema formats:
curl -X POST http://localhost:4000/api/schema/convert \
-H "Content-Type: application/json" \
-d '{"fields": [...]}'
This endpoint is public and does not require authentication.