Skip to main content

Documents

Every uploaded or imported file is registered as a Document record. Documents persist independently of jobs — you can re-run extraction with a different schema without re-uploading.

Get document metadata

curl http://localhost:4000/api/documents/DOCUMENT_ID \
-H "Authorization: Bearer YOUR_TOKEN"

List extractions for a document

View all extraction results across jobs for a single document:

curl http://localhost:4000/api/documents/DOCUMENT_ID/extractions \
-H "Authorization: Bearer YOUR_TOKEN"

Re-run extraction

Create a new extraction job from an existing document:

curl -X POST http://localhost:4000/api/documents/DOCUMENT_ID/extractions \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "EXTRACTION",
"schemaId": "NEW_SCHEMA_ID"
}'

Alternatively, use POST /api/jobs/from-document/:documentId — see Jobs.

Document lifecycle

Documents are created when:

  • You upload via POST /api/jobs
  • Files are imported from integrations (Google Drive, Dropbox, S3, URLs)
  • Presigned upload is confirmed via POST /api/jobs/confirm-upload

Imported files are downloaded asynchronously via the import queue before jobs are created.

Storage

By default, files are stored locally (STORAGE_DRIVER=local, ./uploads). In production, configure S3-compatible storage via environment variables. Static files are served at /uploads when using local storage.