Integrations
Connect external file sources and automate batch extractions via API or workflow tools like n8n.
Overview
ExtractForm supports importing files from:
- Google Drive (OAuth)
- Dropbox (OAuth)
- User-owned S3 (access key + bucket)
- Public URLs (Excel/CSV lists, direct links)
Files are downloaded asynchronously via the import queue, registered as Document records, and processed as child jobs under a JobRun.
OAuth connect flow
Connect Google Drive
curl -X POST http://localhost:4000/api/integrations/google-drive/connect \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"label": "My Drive"}'
Response: { "authUrl": "...", "state": "..." } — redirect the user to authUrl.
OAuth callback: GET /api/integrations/google-drive/callback?code=...&state=...
Connect Dropbox
curl -X POST http://localhost:4000/api/integrations/dropbox/connect \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"label": "Accounting Dropbox"}'
In the Dropbox App Console, enable these permissions:
account_info.read— test connectionfiles.metadata.read— browse foldersfiles.content.read— download files
After changing permissions, disconnect and reconnect so a new token is issued.
Connect user S3 bucket
curl -X POST http://localhost:4000/api/integrations/s3/connect \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"label": "Invoices bucket",
"accessKeyId": "...",
"secretAccessKey": "...",
"bucket": "my-bucket",
"region": "eu-west-1",
"prefix": "invoices/"
}'
Browse remote files
curl "http://localhost:4000/api/integrations/INTEGRATION_ID/files?folderId=root" \
-H "Authorization: Bearer YOUR_TOKEN"
Batch import from connected storage
curl -X POST http://localhost:4000/api/job-runs/from-sources \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"integrationId": "INTEGRATION_ID",
"fileIds": ["file-id-1", "file-id-2"],
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION"
}'
See Batch Runs for monitoring progress.
Batch import from public URLs
curl -X POST http://localhost:4000/api/job-runs/from-urls \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION",
"urls": [
{ "url": "https://example.com/invoice1.pdf", "externalRef": "row-1" }
]
}'
Automation trigger (n8n)
Single endpoint for mixed sources:
curl -X POST http://localhost:4000/api/integrations/trigger \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION",
"files": [
{ "url": "https://example.com/doc.pdf", "externalRef": "n8n-1" },
{
"integrationId": "INTEGRATION_ID",
"fileId": "DRIVE_FILE_ID",
"externalRef": "n8n-2"
}
]
}'
n8n example workflow
- HTTP Request →
POST /api/integrations/triggerwith API key header - Wait or poll
GET /api/job-runs/:jobRunIduntil status isCOMPLETED - Optionally listen for
job_run.completedon your webhook endpoint
Presigned S3 upload
# Step 1
curl -X POST http://localhost:4000/api/jobs/upload-url \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"filename": "invoice.pdf"}'
# Step 2: Upload to returned uploadUrl
# Step 3
curl -X POST http://localhost:4000/api/jobs/confirm-upload \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"key": "RETURNED_KEY",
"filename": "invoice.pdf",
"type": "EXTRACTION",
"schemaId": "SCHEMA_ID"
}'
Scheduled folder sync
Register a watch to poll a remote folder on a schedule:
curl -X POST http://localhost:4000/api/integrations/sync-watches \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"integrationId": "INTEGRATION_ID",
"folderId": "root",
"folderName": "Invoices",
"schemaId": "SCHEMA_ID",
"jobType": "EXTRACTION",
"importExisting": false
}'
| Option | Description |
|---|---|
importExisting: true | Import all existing files on first sync |
importExisting: false | Only files added after watch creation |
Manage watches:
GET /api/integrations/sync-watches— listPATCH /api/integrations/sync-watches/:watchId— updateDELETE /api/integrations/sync-watches/:watchId— delete
Manage integrations
| Method | Path | Description |
|---|---|---|
GET | /api/integrations | List connections |
GET | /api/integrations/:id/files | Browse files |
POST | /api/integrations/:id/test | Test connection |
DELETE | /api/integrations/:id | Disconnect |
Environment variables
| Variable | Description |
|---|---|
GOOGLE_OAUTH_CLIENT_ID | Google OAuth client ID |
GOOGLE_OAUTH_CLIENT_SECRET | Google OAuth secret |
GOOGLE_OAUTH_REDIRECT_URI | OAuth callback URL |
DROPBOX_OAUTH_CLIENT_ID | Dropbox app key |
DROPBOX_OAUTH_CLIENT_SECRET | Dropbox app secret |
DROPBOX_OAUTH_REDIRECT_URI | OAuth callback URL |
WEBHOOK_ENCRYPTION_KEY | Encrypts OAuth tokens at rest |
IMPORT_QUEUE_CONCURRENCY | Parallel import workers (default 3) |
INTEGRATION_SYNC_CRON | Cron for scheduled folder sync |