Documents

Submit documents to the DocExtract API for processing and data extraction. Documents are processed asynchronously and return a job ID that you can use to check status and retrieve results.

The Document Job Object

{
  "id": "job_abc123...",
  "org_id": "org_xyz789...",
  "extractor_id": "ext_def456...",
  "status": "completed",
  "document_url": "https://storage.adteco.com/documents/...",
  "mime_type": "application/pdf",
  "file_size_bytes": 245678,
  "extracted_data": {
    "invoice_number": "INV-2024-001",
    "total_amount": 1250.50,
    "invoice_date": "2024-11-15"
  },
  "confidence": {
    "invoice_number": 0.98,
    "total_amount": 0.95,
    "invoice_date": 0.99
  },
  "processing_time_ms": 4532,
  "cost_credits": 2,
  "error_details": null,
  "created_at": "2024-11-23T10:00:00Z",
  "started_at": "2024-11-23T10:00:01Z",
  "completed_at": "2024-11-23T10:00:05Z"
}

Job Statuses

Status	Description
`queued`	Job is waiting to be processed
`processing`	Document is currently being analyzed
`completed`	Extraction completed successfully
`failed`	Processing failed (see `error_details`)

Attributes

Attribute	Type	Description
`id`	string	Unique job identifier
`org_id`	string	Organization ID
`extractor_id`	string	Extractor template used
`status`	string	Current job status
`document_url`	string	Secure URL to access the document
`mime_type`	string	Document MIME type
`file_size_bytes`	number	Document file size
`extracted_data`	object	Extracted field values
`confidence`	object	Confidence scores per field (0-1)
`processing_time_ms`	number	Processing duration in milliseconds
`cost_credits`	number	Credits consumed
`error_details`	object	Error information if failed
`created_at`	string	Job creation timestamp
`started_at`	string	Processing start timestamp
`completed_at`	string	Processing completion timestamp

Submit a Document

Submit a document for extraction. The document must be base64-encoded.

Supported File Types

Format	MIME Type	Max Size
PDF	`application/pdf`	10 MB
PNG	`image/png`	10 MB
JPG/JPEG	`image/jpeg`	10 MB
WEBP	`image/webp`	10 MB
TIFF	`image/tiff`	10 MB

# Convert document to base64
base64_doc=$(base64 -i invoice.pdf)

curl -X POST https://api.adteco.com/v1/documents \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "extractor_id": "ext_abc123...",
    "document": "'$base64_doc'",
    "mime_type": "application/pdf",
    "metadata": {
      "source": "email_attachment",
      "user_id": "user_123"
    }
  }'

import fs from 'fs';

// Read and convert document to base64
const documentBuffer = fs.readFileSync('./invoice.pdf');
const base64Document = documentBuffer.toString('base64');

const response = await fetch('https://api.adteco.com/v1/documents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk_live_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    extractor_id: 'ext_abc123...',
    document: base64Document,
    mime_type: 'application/pdf',
    metadata: {
      source: 'email_attachment',
      user_id: 'user_123',
    },
  }),
});

const job = await response.json();
console.log('Job ID:', job.id);
console.log('Status:', job.status);

import requests
import base64

# Read and convert document to base64
with open('invoice.pdf', 'rb') as f:
    document_bytes = f.read()
    base64_document = base64.b64encode(document_bytes).decode('utf-8')

response = requests.post(
    'https://api.adteco.com/v1/documents',
    headers={
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'extractor_id': 'ext_abc123...',
        'document': base64_document,
        'mime_type': 'application/pdf',
        'metadata': {
            'source': 'email_attachment',
            'user_id': 'user_123',
        },
    },
)

job = response.json()
print('Job ID:', job['id'])
print('Status:', job['status'])

Request Body

Field	Type	Required	Description
`extractor_id`	string	Yes	Extractor template to use
`document`	string	Yes	Base64-encoded document
`mime_type`	string	Yes	Document MIME type
`metadata`	object	No	Custom metadata (max 10 KB)
`webhook_url`	string	No	Override default webhook URL
`priority`	string	No	Job priority: `low`, `normal`, `high`

Response

{
  "id": "job_abc123...",
  "extractor_id": "ext_abc123...",
  "status": "queued",
  "created_at": "2024-11-23T10:00:00Z"
}

Processing Time: Most documents process in 3-10 seconds. Complex documents may take up to 30 seconds.

Get Job Status and Results

Retrieve the current status and extracted data for a job.

curl -X GET https://api.adteco.com/v1/documents/job_abc123... \
  -H "Authorization: Bearer sk_live_your_api_key"

const jobId = 'job_abc123...';

const response = await fetch(
  `https://api.adteco.com/v1/documents/${jobId}`,
  {
    headers: {
      'Authorization': 'Bearer sk_live_your_api_key',
    },
  }
);

const job = await response.json();

if (job.status === 'completed') {
  console.log('Extracted data:', job.extracted_data);
  console.log('Confidence scores:', job.confidence);
} else if (job.status === 'failed') {
  console.error('Job failed:', job.error_details);
} else {
  console.log('Job is still processing...');
}

job_id = 'job_abc123...'

response = requests.get(
    f'https://api.adteco.com/v1/documents/{job_id}',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
)

job = response.json()

if job['status'] == 'completed':
    print('Extracted data:', job['extracted_data'])
    print('Confidence scores:', job['confidence'])
elif job['status'] == 'failed':
    print('Job failed:', job['error_details'])
else:
    print('Job is still processing...')

Response (Completed)

{
  "id": "job_abc123...",
  "extractor_id": "ext_abc123...",
  "status": "completed",
  "extracted_data": {
    "invoice_number": "INV-2024-001",
    "total_amount": 1250.50,
    "invoice_date": "2024-11-15",
    "vendor_name": "Acme Corporation"
  },
  "confidence": {
    "invoice_number": 0.98,
    "total_amount": 0.95,
    "invoice_date": 0.99,
    "vendor_name": 0.92
  },
  "processing_time_ms": 4532,
  "cost_credits": 2,
  "completed_at": "2024-11-23T10:00:05Z"
}

Response (Failed)

{
  "id": "job_abc123...",
  "status": "failed",
  "error_details": {
    "code": "document_unreadable",
    "message": "Document quality is too low to extract text",
    "suggestion": "Please provide a higher quality scan or image"
  },
  "cost_credits": 0
}

Polling for Results

Poll the job endpoint until processing completes.

async function waitForResults(jobId: string): Promise<any> {
  const maxAttempts = 30; // 60 seconds total
  const pollInterval = 2000; // 2 seconds

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const response = await fetch(
      `https://api.adteco.com/v1/documents/${jobId}`,
      {
        headers: {
          'Authorization': 'Bearer sk_live_your_api_key',
        },
      }
    );

    const job = await response.json();

    if (job.status === 'completed') {
      return job.extracted_data;
    } else if (job.status === 'failed') {
      throw new Error(`Job failed: ${job.error_details?.message}`);
    }

    // Wait before next poll
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }

  throw new Error('Job processing timeout');
}

// Usage
try {
  const extractedData = await waitForResults('job_abc123...');
  console.log(extractedData);
} catch (error) {
  console.error('Error:', error.message);
}

import time

def wait_for_results(job_id: str, max_attempts: int = 30):
    """Poll for job results with exponential backoff."""
    poll_interval = 2  # seconds

    for attempt in range(max_attempts):
        response = requests.get(
            f'https://api.adteco.com/v1/documents/{job_id}',
            headers={'Authorization': 'Bearer sk_live_your_api_key'},
        )

        job = response.json()

        if job['status'] == 'completed':
            return job['extracted_data']
        elif job['status'] == 'failed':
            raise Exception(f"Job failed: {job['error_details']['message']}")

        # Wait before next poll
        time.sleep(poll_interval)

    raise Exception('Job processing timeout')

# Usage
try:
    extracted_data = wait_for_results('job_abc123...')
    print(extracted_data)
except Exception as e:
    print(f'Error: {e}')

Better Alternative: Use webhooks instead of polling to receive real-time notifications when jobs complete.

List All Jobs

Retrieve all document processing jobs for your organization.

curl -X GET "https://api.adteco.com/v1/documents?status=completed&limit=20" \
  -H "Authorization: Bearer sk_live_your_api_key"

const params = new URLSearchParams({
  status: 'completed',
  limit: '20',
  offset: '0',
});

const response = await fetch(
  `https://api.adteco.com/v1/documents?${params}`,
  {
    headers: {
      'Authorization': 'Bearer sk_live_your_api_key',
    },
  }
);

const data = await response.json();
console.log(`Found ${data.jobs.length} jobs`);

params = {
    'status': 'completed',
    'limit': 20,
    'offset': 0,
}

response = requests.get(
    'https://api.adteco.com/v1/documents',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
    params=params,
)

data = response.json()
print(f"Found {len(data['jobs'])} jobs")

Query Parameters

Parameter	Type	Description
`extractor_id`	string	Filter by extractor
`status`	string	Filter by status
`created_after`	string	ISO 8601 timestamp
`created_before`	string	ISO 8601 timestamp
`limit`	number	Max results (default: 50, max: 100)
`offset`	number	Results offset for pagination

Response

{
  "jobs": [
    {
      "id": "job_abc123...",
      "extractor_id": "ext_def456...",
      "status": "completed",
      "extracted_data": {...},
      "cost_credits": 2,
      "created_at": "2024-11-23T10:00:00Z",
      "completed_at": "2024-11-23T10:00:05Z"
    }
  ],
  "total": 42,
  "limit": 50,
  "offset": 0
}

Confidence Scores

Every extracted field includes a confidence score (0-1) indicating how confident the AI is in the extraction.

Interpreting Confidence

Range	Interpretation	Recommendation
`0.95-1.0`	Very High	Safe to auto-process
`0.85-0.94`	High	Generally reliable
`0.70-0.84`	Medium	Consider manual review
`0.50-0.69`	Low	Recommend manual review
`<0.50`	Very Low	Require manual review

Using Confidence Scores

function shouldReviewField(field: string, value: any, confidence: number): boolean {
  // Critical financial fields need higher confidence
  const criticalFields = ['total_amount', 'payment_amount'];
  const confidenceThreshold = criticalFields.includes(field) ? 0.90 : 0.70;

  return confidence < confidenceThreshold;
}

// Check each field
const job = await getJob('job_abc123...');
const fieldsToReview = [];

for (const [field, value] of Object.entries(job.extracted_data)) {
  const confidence = job.confidence[field];

  if (shouldReviewField(field, value, confidence)) {
    fieldsToReview.push({ field, value, confidence });
  }
}

if (fieldsToReview.length > 0) {
  console.log('Fields requiring review:', fieldsToReview);
  // Send to manual review queue
} else {
  // Auto-process with high confidence
  await processInvoice(job.extracted_data);
}

Error Handling

Handle different failure scenarios gracefully.

Common Error Codes

Code	Description	Resolution
`document_too_large`	File exceeds 10 MB	Compress or split document
`document_unreadable`	Cannot extract text	Improve scan quality
`unsupported_format`	Invalid file format	Use supported formats
`insufficient_credits`	Not enough credits	Purchase more credits
`extractor_not_found`	Invalid extractor ID	Check extractor exists
`rate_limit_exceeded`	Too many requests	Reduce request rate

Example Error Handling

async function processDocument(extractorId: string, documentPath: string) {
  try {
    // Submit document
    const job = await submitDocument(extractorId, documentPath);

    // Wait for results
    const results = await waitForResults(job.id);

    return results;
  } catch (error) {
    if (error.code === 'document_unreadable') {
      console.error('Document quality too low. Please rescan.');
      // Notify user to provide better scan
    } else if (error.code === 'insufficient_credits') {
      console.error('Out of credits. Purchase more to continue.');
      // Redirect to billing page
    } else if (error.code === 'rate_limit_exceeded') {
      console.error('Rate limit exceeded. Retrying in 60 seconds...');
      // Implement exponential backoff
      await sleep(60000);
      return processDocument(extractorId, documentPath);
    } else {
      console.error('Unexpected error:', error);
      // Log to error tracking service
    }
  }
}

Metadata

Attach custom metadata to jobs for tracking and organization.

const response = await fetch('https://api.adteco.com/v1/documents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk_live_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    extractor_id: 'ext_abc123...',
    document: base64Document,
    mime_type: 'application/pdf',
    metadata: {
      // Your custom data
      customer_id: 'cust_123',
      invoice_type: 'recurring',
      source: 'email_attachment',
      processed_by: 'user_456',
      tags: ['urgent', 'large_amount'],
    },
  }),
});

Metadata is returned with job results and included in webhook notifications.

Metadata Limits: Maximum 10 KB per job. Metadata is not indexed or searchable.

Document URLs

Successfully processed documents are stored securely and accessible via the document_url field.

Accessing Documents

const job = await getJob('job_abc123...');

// Document URL is pre-signed and expires after 24 hours
const documentUrl = job.document_url;

// Download the original document
const response = await fetch(documentUrl);
const blob = await response.blob();

// Save locally
const buffer = Buffer.from(await blob.arrayBuffer());
fs.writeFileSync('downloaded_invoice.pdf', buffer);

Security: Document URLs are pre-signed and expire after 24 hours. Generate new URLs by fetching the job again.

Best Practices

Document Quality

For best extraction results:

Resolution: Minimum 150 DPI, recommended 300+ DPI
Format: PDF is preferred; avoid low-quality JPEGs
Orientation: Ensure documents are right-side up
Cropping: Include full document pages
Clarity: Avoid blurry or distorted scans

Batch Processing

Process multiple documents efficiently:

async function processBatch(extractorId: string, documents: string[]) {
  // Submit all documents in parallel
  const submissions = documents.map(doc =>
    fetch('https://api.adteco.com/v1/documents', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        extractor_id: extractorId,
        document: doc,
        mime_type: 'application/pdf',
      }),
    })
  );

  const jobs = await Promise.all(submissions);

  // Use webhooks to receive results as they complete
  // Or poll for results
  return jobs.map(j => j.id);
}

Cost Optimization

Minimize credit usage:

Use test keys during development
Test extractors thoroughly before production use
Implement quality checks before submission
Cache results to avoid reprocessing

Documents

Documents

The Document Job Object

Job Statuses

Attributes

Submit a Document

Supported File Types

Request Body

Response

Get Job Status and Results

Response (Completed)

Response (Failed)

Polling for Results

List All Jobs

Query Parameters

Response

Confidence Scores

Interpreting Confidence

Using Confidence Scores

Error Handling

Common Error Codes

Example Error Handling

Metadata

Document URLs

Accessing Documents

Best Practices

Document Quality

Batch Processing

Cost Optimization

Next Steps

Job Management

Webhooks

Error Handling

On this page