Documents
Submit documents for AI-powered data extraction
Documents
Submit documents to the DocExtract API for processing and data extraction. Documents are processed asynchronously and return a job ID that you can use to check status and retrieve results.
The Document Job Object
{
"id": "job_abc123...",
"org_id": "org_xyz789...",
"extractor_id": "ext_def456...",
"status": "completed",
"document_url": "https://storage.adteco.com/documents/...",
"mime_type": "application/pdf",
"file_size_bytes": 245678,
"extracted_data": {
"invoice_number": "INV-2024-001",
"total_amount": 1250.50,
"invoice_date": "2024-11-15"
},
"confidence": {
"invoice_number": 0.98,
"total_amount": 0.95,
"invoice_date": 0.99
},
"processing_time_ms": 4532,
"cost_credits": 2,
"error_details": null,
"created_at": "2024-11-23T10:00:00Z",
"started_at": "2024-11-23T10:00:01Z",
"completed_at": "2024-11-23T10:00:05Z"
}Job Statuses
| Status | Description |
|---|---|
queued | Job is waiting to be processed |
processing | Document is currently being analyzed |
completed | Extraction completed successfully |
failed | Processing failed (see error_details) |
Attributes
| Attribute | Type | Description |
|---|---|---|
id | string | Unique job identifier |
org_id | string | Organization ID |
extractor_id | string | Extractor template used |
status | string | Current job status |
document_url | string | Secure URL to access the document |
mime_type | string | Document MIME type |
file_size_bytes | number | Document file size |
extracted_data | object | Extracted field values |
confidence | object | Confidence scores per field (0-1) |
processing_time_ms | number | Processing duration in milliseconds |
cost_credits | number | Credits consumed |
error_details | object | Error information if failed |
created_at | string | Job creation timestamp |
started_at | string | Processing start timestamp |
completed_at | string | Processing completion timestamp |
Submit a Document
Submit a document for extraction. The document must be base64-encoded.
Supported File Types
| Format | MIME Type | Max Size |
|---|---|---|
application/pdf | 10 MB | |
| PNG | image/png | 10 MB |
| JPG/JPEG | image/jpeg | 10 MB |
| WEBP | image/webp | 10 MB |
| TIFF | image/tiff | 10 MB |
# Convert document to base64
base64_doc=$(base64 -i invoice.pdf)
curl -X POST https://api.adteco.com/v1/documents \
-H "Authorization: Bearer sk_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"extractor_id": "ext_abc123...",
"document": "'$base64_doc'",
"mime_type": "application/pdf",
"metadata": {
"source": "email_attachment",
"user_id": "user_123"
}
}'import fs from 'fs';
// Read and convert document to base64
const documentBuffer = fs.readFileSync('./invoice.pdf');
const base64Document = documentBuffer.toString('base64');
const response = await fetch('https://api.adteco.com/v1/documents', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
extractor_id: 'ext_abc123...',
document: base64Document,
mime_type: 'application/pdf',
metadata: {
source: 'email_attachment',
user_id: 'user_123',
},
}),
});
const job = await response.json();
console.log('Job ID:', job.id);
console.log('Status:', job.status);import requests
import base64
# Read and convert document to base64
with open('invoice.pdf', 'rb') as f:
document_bytes = f.read()
base64_document = base64.b64encode(document_bytes).decode('utf-8')
response = requests.post(
'https://api.adteco.com/v1/documents',
headers={
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
json={
'extractor_id': 'ext_abc123...',
'document': base64_document,
'mime_type': 'application/pdf',
'metadata': {
'source': 'email_attachment',
'user_id': 'user_123',
},
},
)
job = response.json()
print('Job ID:', job['id'])
print('Status:', job['status'])Request Body
| Field | Type | Required | Description |
|---|---|---|---|
extractor_id | string | Yes | Extractor template to use |
document | string | Yes | Base64-encoded document |
mime_type | string | Yes | Document MIME type |
metadata | object | No | Custom metadata (max 10 KB) |
webhook_url | string | No | Override default webhook URL |
priority | string | No | Job priority: low, normal, high |
Response
{
"id": "job_abc123...",
"extractor_id": "ext_abc123...",
"status": "queued",
"created_at": "2024-11-23T10:00:00Z"
}Processing Time: Most documents process in 3-10 seconds. Complex documents may take up to 30 seconds.
Get Job Status and Results
Retrieve the current status and extracted data for a job.
curl -X GET https://api.adteco.com/v1/documents/job_abc123... \
-H "Authorization: Bearer sk_live_your_api_key"const jobId = 'job_abc123...';
const response = await fetch(
`https://api.adteco.com/v1/documents/${jobId}`,
{
headers: {
'Authorization': 'Bearer sk_live_your_api_key',
},
}
);
const job = await response.json();
if (job.status === 'completed') {
console.log('Extracted data:', job.extracted_data);
console.log('Confidence scores:', job.confidence);
} else if (job.status === 'failed') {
console.error('Job failed:', job.error_details);
} else {
console.log('Job is still processing...');
}job_id = 'job_abc123...'
response = requests.get(
f'https://api.adteco.com/v1/documents/{job_id}',
headers={'Authorization': 'Bearer sk_live_your_api_key'},
)
job = response.json()
if job['status'] == 'completed':
print('Extracted data:', job['extracted_data'])
print('Confidence scores:', job['confidence'])
elif job['status'] == 'failed':
print('Job failed:', job['error_details'])
else:
print('Job is still processing...')Response (Completed)
{
"id": "job_abc123...",
"extractor_id": "ext_abc123...",
"status": "completed",
"extracted_data": {
"invoice_number": "INV-2024-001",
"total_amount": 1250.50,
"invoice_date": "2024-11-15",
"vendor_name": "Acme Corporation"
},
"confidence": {
"invoice_number": 0.98,
"total_amount": 0.95,
"invoice_date": 0.99,
"vendor_name": 0.92
},
"processing_time_ms": 4532,
"cost_credits": 2,
"completed_at": "2024-11-23T10:00:05Z"
}Response (Failed)
{
"id": "job_abc123...",
"status": "failed",
"error_details": {
"code": "document_unreadable",
"message": "Document quality is too low to extract text",
"suggestion": "Please provide a higher quality scan or image"
},
"cost_credits": 0
}Polling for Results
Poll the job endpoint until processing completes.
async function waitForResults(jobId: string): Promise<any> {
const maxAttempts = 30; // 60 seconds total
const pollInterval = 2000; // 2 seconds
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const response = await fetch(
`https://api.adteco.com/v1/documents/${jobId}`,
{
headers: {
'Authorization': 'Bearer sk_live_your_api_key',
},
}
);
const job = await response.json();
if (job.status === 'completed') {
return job.extracted_data;
} else if (job.status === 'failed') {
throw new Error(`Job failed: ${job.error_details?.message}`);
}
// Wait before next poll
await new Promise(resolve => setTimeout(resolve, pollInterval));
}
throw new Error('Job processing timeout');
}
// Usage
try {
const extractedData = await waitForResults('job_abc123...');
console.log(extractedData);
} catch (error) {
console.error('Error:', error.message);
}import time
def wait_for_results(job_id: str, max_attempts: int = 30):
"""Poll for job results with exponential backoff."""
poll_interval = 2 # seconds
for attempt in range(max_attempts):
response = requests.get(
f'https://api.adteco.com/v1/documents/{job_id}',
headers={'Authorization': 'Bearer sk_live_your_api_key'},
)
job = response.json()
if job['status'] == 'completed':
return job['extracted_data']
elif job['status'] == 'failed':
raise Exception(f"Job failed: {job['error_details']['message']}")
# Wait before next poll
time.sleep(poll_interval)
raise Exception('Job processing timeout')
# Usage
try:
extracted_data = wait_for_results('job_abc123...')
print(extracted_data)
except Exception as e:
print(f'Error: {e}')Better Alternative: Use webhooks instead of polling to receive real-time notifications when jobs complete.
List All Jobs
Retrieve all document processing jobs for your organization.
curl -X GET "https://api.adteco.com/v1/documents?status=completed&limit=20" \
-H "Authorization: Bearer sk_live_your_api_key"const params = new URLSearchParams({
status: 'completed',
limit: '20',
offset: '0',
});
const response = await fetch(
`https://api.adteco.com/v1/documents?${params}`,
{
headers: {
'Authorization': 'Bearer sk_live_your_api_key',
},
}
);
const data = await response.json();
console.log(`Found ${data.jobs.length} jobs`);params = {
'status': 'completed',
'limit': 20,
'offset': 0,
}
response = requests.get(
'https://api.adteco.com/v1/documents',
headers={'Authorization': 'Bearer sk_live_your_api_key'},
params=params,
)
data = response.json()
print(f"Found {len(data['jobs'])} jobs")Query Parameters
| Parameter | Type | Description |
|---|---|---|
extractor_id | string | Filter by extractor |
status | string | Filter by status |
created_after | string | ISO 8601 timestamp |
created_before | string | ISO 8601 timestamp |
limit | number | Max results (default: 50, max: 100) |
offset | number | Results offset for pagination |
Response
{
"jobs": [
{
"id": "job_abc123...",
"extractor_id": "ext_def456...",
"status": "completed",
"extracted_data": {...},
"cost_credits": 2,
"created_at": "2024-11-23T10:00:00Z",
"completed_at": "2024-11-23T10:00:05Z"
}
],
"total": 42,
"limit": 50,
"offset": 0
}Confidence Scores
Every extracted field includes a confidence score (0-1) indicating how confident the AI is in the extraction.
Interpreting Confidence
| Range | Interpretation | Recommendation |
|---|---|---|
0.95-1.0 | Very High | Safe to auto-process |
0.85-0.94 | High | Generally reliable |
0.70-0.84 | Medium | Consider manual review |
0.50-0.69 | Low | Recommend manual review |
<0.50 | Very Low | Require manual review |
Using Confidence Scores
function shouldReviewField(field: string, value: any, confidence: number): boolean {
// Critical financial fields need higher confidence
const criticalFields = ['total_amount', 'payment_amount'];
const confidenceThreshold = criticalFields.includes(field) ? 0.90 : 0.70;
return confidence < confidenceThreshold;
}
// Check each field
const job = await getJob('job_abc123...');
const fieldsToReview = [];
for (const [field, value] of Object.entries(job.extracted_data)) {
const confidence = job.confidence[field];
if (shouldReviewField(field, value, confidence)) {
fieldsToReview.push({ field, value, confidence });
}
}
if (fieldsToReview.length > 0) {
console.log('Fields requiring review:', fieldsToReview);
// Send to manual review queue
} else {
// Auto-process with high confidence
await processInvoice(job.extracted_data);
}Error Handling
Handle different failure scenarios gracefully.
Common Error Codes
| Code | Description | Resolution |
|---|---|---|
document_too_large | File exceeds 10 MB | Compress or split document |
document_unreadable | Cannot extract text | Improve scan quality |
unsupported_format | Invalid file format | Use supported formats |
insufficient_credits | Not enough credits | Purchase more credits |
extractor_not_found | Invalid extractor ID | Check extractor exists |
rate_limit_exceeded | Too many requests | Reduce request rate |
Example Error Handling
async function processDocument(extractorId: string, documentPath: string) {
try {
// Submit document
const job = await submitDocument(extractorId, documentPath);
// Wait for results
const results = await waitForResults(job.id);
return results;
} catch (error) {
if (error.code === 'document_unreadable') {
console.error('Document quality too low. Please rescan.');
// Notify user to provide better scan
} else if (error.code === 'insufficient_credits') {
console.error('Out of credits. Purchase more to continue.');
// Redirect to billing page
} else if (error.code === 'rate_limit_exceeded') {
console.error('Rate limit exceeded. Retrying in 60 seconds...');
// Implement exponential backoff
await sleep(60000);
return processDocument(extractorId, documentPath);
} else {
console.error('Unexpected error:', error);
// Log to error tracking service
}
}
}Metadata
Attach custom metadata to jobs for tracking and organization.
const response = await fetch('https://api.adteco.com/v1/documents', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
extractor_id: 'ext_abc123...',
document: base64Document,
mime_type: 'application/pdf',
metadata: {
// Your custom data
customer_id: 'cust_123',
invoice_type: 'recurring',
source: 'email_attachment',
processed_by: 'user_456',
tags: ['urgent', 'large_amount'],
},
}),
});Metadata is returned with job results and included in webhook notifications.
Metadata Limits: Maximum 10 KB per job. Metadata is not indexed or searchable.
Document URLs
Successfully processed documents are stored securely and accessible via the document_url field.
Accessing Documents
const job = await getJob('job_abc123...');
// Document URL is pre-signed and expires after 24 hours
const documentUrl = job.document_url;
// Download the original document
const response = await fetch(documentUrl);
const blob = await response.blob();
// Save locally
const buffer = Buffer.from(await blob.arrayBuffer());
fs.writeFileSync('downloaded_invoice.pdf', buffer);Security: Document URLs are pre-signed and expire after 24 hours. Generate new URLs by fetching the job again.
Best Practices
Document Quality
For best extraction results:
- Resolution: Minimum 150 DPI, recommended 300+ DPI
- Format: PDF is preferred; avoid low-quality JPEGs
- Orientation: Ensure documents are right-side up
- Cropping: Include full document pages
- Clarity: Avoid blurry or distorted scans
Batch Processing
Process multiple documents efficiently:
async function processBatch(extractorId: string, documents: string[]) {
// Submit all documents in parallel
const submissions = documents.map(doc =>
fetch('https://api.adteco.com/v1/documents', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
extractor_id: extractorId,
document: doc,
mime_type: 'application/pdf',
}),
})
);
const jobs = await Promise.all(submissions);
// Use webhooks to receive results as they complete
// Or poll for results
return jobs.map(j => j.id);
}Cost Optimization
Minimize credit usage:
- Use test keys during development
- Test extractors thoroughly before production use
- Implement quality checks before submission
- Cache results to avoid reprocessing