Skip to main content

Integrations

Connect external file sources and automate batch extractions via API or workflow tools like n8n.

Overview

ExtractForm supports importing files from:

  • Google Drive (OAuth)
  • Dropbox (OAuth)
  • User-owned S3 (access key + bucket)
  • Public URLs (Excel/CSV lists, direct links)

Files are downloaded asynchronously via the import queue, registered as Document records, and processed as child jobs under a JobRun.

OAuth connect flow

Connect Google Drive

curl -X POST http://localhost:4000/api/integrations/google-drive/connect \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"label": "My Drive"}'

Response: { "authUrl": "...", "state": "..." } — redirect the user to authUrl.

OAuth callback: GET /api/integrations/google-drive/callback?code=...&state=...

Connect Dropbox

curl -X POST http://localhost:4000/api/integrations/dropbox/connect \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"label": "Accounting Dropbox"}'

In the Dropbox App Console, enable these permissions:

  • account_info.read — test connection
  • files.metadata.read — browse folders
  • files.content.read — download files

After changing permissions, disconnect and reconnect so a new token is issued.

Connect user S3 bucket

curl -X POST http://localhost:4000/api/integrations/s3/connect \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"label": "Invoices bucket",
"accessKeyId": "...",
"secretAccessKey": "...",
"bucket": "my-bucket",
"region": "eu-west-1",
"prefix": "invoices/"
}'

Browse remote files

curl "http://localhost:4000/api/integrations/INTEGRATION_ID/files?folderId=root" \
-H "Authorization: Bearer YOUR_TOKEN"

Batch import from connected storage

curl -X POST http://localhost:4000/api/job-runs/from-sources \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"integrationId": "INTEGRATION_ID",
"fileIds": ["file-id-1", "file-id-2"],
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION"
}'

See Batch Runs for monitoring progress.

Batch import from public URLs

curl -X POST http://localhost:4000/api/job-runs/from-urls \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION",
"urls": [
{ "url": "https://example.com/invoice1.pdf", "externalRef": "row-1" }
]
}'

Automation trigger (n8n)

Single endpoint for mixed sources:

curl -X POST http://localhost:4000/api/integrations/trigger \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION",
"files": [
{ "url": "https://example.com/doc.pdf", "externalRef": "n8n-1" },
{
"integrationId": "INTEGRATION_ID",
"fileId": "DRIVE_FILE_ID",
"externalRef": "n8n-2"
}
]
}'

n8n example workflow

  1. HTTP RequestPOST /api/integrations/trigger with API key header
  2. Wait or poll GET /api/job-runs/:jobRunId until status is COMPLETED
  3. Optionally listen for job_run.completed on your webhook endpoint

Presigned S3 upload

# Step 1
curl -X POST http://localhost:4000/api/jobs/upload-url \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"filename": "invoice.pdf"}'

# Step 2: Upload to returned uploadUrl

# Step 3
curl -X POST http://localhost:4000/api/jobs/confirm-upload \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"key": "RETURNED_KEY",
"filename": "invoice.pdf",
"type": "EXTRACTION",
"schemaId": "SCHEMA_ID"
}'

Scheduled folder sync

Register a watch to poll a remote folder on a schedule:

curl -X POST http://localhost:4000/api/integrations/sync-watches \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"integrationId": "INTEGRATION_ID",
"folderId": "root",
"folderName": "Invoices",
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION",
"importExisting": false
}'
OptionDescription
importExisting: trueImport all existing files on first sync
importExisting: falseOnly files added after watch creation

Manage watches:

  • GET /api/integrations/sync-watches — list
  • PATCH /api/integrations/sync-watches/:watchId — update
  • DELETE /api/integrations/sync-watches/:watchId — delete

Manage integrations

MethodPathDescription
GET/api/integrationsList connections
GET/api/integrations/:id/filesBrowse files
POST/api/integrations/:id/testTest connection
DELETE/api/integrations/:idDisconnect

Environment variables

VariableDescription
GOOGLE_OAUTH_CLIENT_IDGoogle OAuth client ID
GOOGLE_OAUTH_CLIENT_SECRETGoogle OAuth secret
GOOGLE_OAUTH_REDIRECT_URIOAuth callback URL
DROPBOX_OAUTH_CLIENT_IDDropbox app key
DROPBOX_OAUTH_CLIENT_SECRETDropbox app secret
DROPBOX_OAUTH_REDIRECT_URIOAuth callback URL
WEBHOOK_ENCRYPTION_KEYEncrypts OAuth tokens at rest
IMPORT_QUEUE_CONCURRENCYParallel import workers (default 3)
INTEGRATION_SYNC_CRONCron for scheduled folder sync