My App
DocExtract API

Documents

Submit documents for AI-powered data extraction

Documents

Submit documents to the DocExtract API for processing and data extraction. Documents are processed asynchronously and return a job ID that you can use to check status and retrieve results.

The Document Job Object

{
  "id": "job_abc123...",
  "org_id": "org_xyz789...",
  "extractor_id": "ext_def456...",
  "status": "completed",
  "document_url": "https://storage.adteco.com/documents/...",
  "mime_type": "application/pdf",
  "file_size_bytes": 245678,
  "extracted_data": {
    "invoice_number": "INV-2024-001",
    "total_amount": 1250.50,
    "invoice_date": "2024-11-15"
  },
  "confidence": {
    "invoice_number": 0.98,
    "total_amount": 0.95,
    "invoice_date": 0.99
  },
  "processing_time_ms": 4532,
  "cost_credits": 2,
  "error_details": null,
  "created_at": "2024-11-23T10:00:00Z",
  "started_at": "2024-11-23T10:00:01Z",
  "completed_at": "2024-11-23T10:00:05Z"
}

Job Statuses

StatusDescription
queuedJob is waiting to be processed
processingDocument is currently being analyzed
completedExtraction completed successfully
failedProcessing failed (see error_details)

Attributes

AttributeTypeDescription
idstringUnique job identifier
org_idstringOrganization ID
extractor_idstringExtractor template used
statusstringCurrent job status
document_urlstringSecure URL to access the document
mime_typestringDocument MIME type
file_size_bytesnumberDocument file size
extracted_dataobjectExtracted field values
confidenceobjectConfidence scores per field (0-1)
processing_time_msnumberProcessing duration in milliseconds
cost_creditsnumberCredits consumed
error_detailsobjectError information if failed
created_atstringJob creation timestamp
started_atstringProcessing start timestamp
completed_atstringProcessing completion timestamp

Submit a Document

Submit a document for extraction. The document must be base64-encoded.

Supported File Types

FormatMIME TypeMax Size
PDFapplication/pdf10 MB
PNGimage/png10 MB
JPG/JPEGimage/jpeg10 MB
WEBPimage/webp10 MB
TIFFimage/tiff10 MB
# Convert document to base64
base64_doc=$(base64 -i invoice.pdf)

curl -X POST https://api.adteco.com/v1/documents \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "extractor_id": "ext_abc123...",
    "document": "'$base64_doc'",
    "mime_type": "application/pdf",
    "metadata": {
      "source": "email_attachment",
      "user_id": "user_123"
    }
  }'
import fs from 'fs';

// Read and convert document to base64
const documentBuffer = fs.readFileSync('./invoice.pdf');
const base64Document = documentBuffer.toString('base64');

const response = await fetch('https://api.adteco.com/v1/documents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk_live_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    extractor_id: 'ext_abc123...',
    document: base64Document,
    mime_type: 'application/pdf',
    metadata: {
      source: 'email_attachment',
      user_id: 'user_123',
    },
  }),
});

const job = await response.json();
console.log('Job ID:', job.id);
console.log('Status:', job.status);
import requests
import base64

# Read and convert document to base64
with open('invoice.pdf', 'rb') as f:
    document_bytes = f.read()
    base64_document = base64.b64encode(document_bytes).decode('utf-8')

response = requests.post(
    'https://api.adteco.com/v1/documents',
    headers={
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'extractor_id': 'ext_abc123...',
        'document': base64_document,
        'mime_type': 'application/pdf',
        'metadata': {
            'source': 'email_attachment',
            'user_id': 'user_123',
        },
    },
)

job = response.json()
print('Job ID:', job['id'])
print('Status:', job['status'])

Request Body

FieldTypeRequiredDescription
extractor_idstringYesExtractor template to use
documentstringYesBase64-encoded document
mime_typestringYesDocument MIME type
metadataobjectNoCustom metadata (max 10 KB)
webhook_urlstringNoOverride default webhook URL
prioritystringNoJob priority: low, normal, high

Response

{
  "id": "job_abc123...",
  "extractor_id": "ext_abc123...",
  "status": "queued",
  "created_at": "2024-11-23T10:00:00Z"
}

Processing Time: Most documents process in 3-10 seconds. Complex documents may take up to 30 seconds.

Get Job Status and Results

Retrieve the current status and extracted data for a job.

curl -X GET https://api.adteco.com/v1/documents/job_abc123... \
  -H "Authorization: Bearer sk_live_your_api_key"
const jobId = 'job_abc123...';

const response = await fetch(
  `https://api.adteco.com/v1/documents/${jobId}`,
  {
    headers: {
      'Authorization': 'Bearer sk_live_your_api_key',
    },
  }
);

const job = await response.json();

if (job.status === 'completed') {
  console.log('Extracted data:', job.extracted_data);
  console.log('Confidence scores:', job.confidence);
} else if (job.status === 'failed') {
  console.error('Job failed:', job.error_details);
} else {
  console.log('Job is still processing...');
}
job_id = 'job_abc123...'

response = requests.get(
    f'https://api.adteco.com/v1/documents/{job_id}',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
)

job = response.json()

if job['status'] == 'completed':
    print('Extracted data:', job['extracted_data'])
    print('Confidence scores:', job['confidence'])
elif job['status'] == 'failed':
    print('Job failed:', job['error_details'])
else:
    print('Job is still processing...')

Response (Completed)

{
  "id": "job_abc123...",
  "extractor_id": "ext_abc123...",
  "status": "completed",
  "extracted_data": {
    "invoice_number": "INV-2024-001",
    "total_amount": 1250.50,
    "invoice_date": "2024-11-15",
    "vendor_name": "Acme Corporation"
  },
  "confidence": {
    "invoice_number": 0.98,
    "total_amount": 0.95,
    "invoice_date": 0.99,
    "vendor_name": 0.92
  },
  "processing_time_ms": 4532,
  "cost_credits": 2,
  "completed_at": "2024-11-23T10:00:05Z"
}

Response (Failed)

{
  "id": "job_abc123...",
  "status": "failed",
  "error_details": {
    "code": "document_unreadable",
    "message": "Document quality is too low to extract text",
    "suggestion": "Please provide a higher quality scan or image"
  },
  "cost_credits": 0
}

Polling for Results

Poll the job endpoint until processing completes.

async function waitForResults(jobId: string): Promise<any> {
  const maxAttempts = 30; // 60 seconds total
  const pollInterval = 2000; // 2 seconds

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const response = await fetch(
      `https://api.adteco.com/v1/documents/${jobId}`,
      {
        headers: {
          'Authorization': 'Bearer sk_live_your_api_key',
        },
      }
    );

    const job = await response.json();

    if (job.status === 'completed') {
      return job.extracted_data;
    } else if (job.status === 'failed') {
      throw new Error(`Job failed: ${job.error_details?.message}`);
    }

    // Wait before next poll
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }

  throw new Error('Job processing timeout');
}

// Usage
try {
  const extractedData = await waitForResults('job_abc123...');
  console.log(extractedData);
} catch (error) {
  console.error('Error:', error.message);
}
import time

def wait_for_results(job_id: str, max_attempts: int = 30):
    """Poll for job results with exponential backoff."""
    poll_interval = 2  # seconds

    for attempt in range(max_attempts):
        response = requests.get(
            f'https://api.adteco.com/v1/documents/{job_id}',
            headers={'Authorization': 'Bearer sk_live_your_api_key'},
        )

        job = response.json()

        if job['status'] == 'completed':
            return job['extracted_data']
        elif job['status'] == 'failed':
            raise Exception(f"Job failed: {job['error_details']['message']}")

        # Wait before next poll
        time.sleep(poll_interval)

    raise Exception('Job processing timeout')

# Usage
try:
    extracted_data = wait_for_results('job_abc123...')
    print(extracted_data)
except Exception as e:
    print(f'Error: {e}')

Better Alternative: Use webhooks instead of polling to receive real-time notifications when jobs complete.

List All Jobs

Retrieve all document processing jobs for your organization.

curl -X GET "https://api.adteco.com/v1/documents?status=completed&limit=20" \
  -H "Authorization: Bearer sk_live_your_api_key"
const params = new URLSearchParams({
  status: 'completed',
  limit: '20',
  offset: '0',
});

const response = await fetch(
  `https://api.adteco.com/v1/documents?${params}`,
  {
    headers: {
      'Authorization': 'Bearer sk_live_your_api_key',
    },
  }
);

const data = await response.json();
console.log(`Found ${data.jobs.length} jobs`);
params = {
    'status': 'completed',
    'limit': 20,
    'offset': 0,
}

response = requests.get(
    'https://api.adteco.com/v1/documents',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
    params=params,
)

data = response.json()
print(f"Found {len(data['jobs'])} jobs")

Query Parameters

ParameterTypeDescription
extractor_idstringFilter by extractor
statusstringFilter by status
created_afterstringISO 8601 timestamp
created_beforestringISO 8601 timestamp
limitnumberMax results (default: 50, max: 100)
offsetnumberResults offset for pagination

Response

{
  "jobs": [
    {
      "id": "job_abc123...",
      "extractor_id": "ext_def456...",
      "status": "completed",
      "extracted_data": {...},
      "cost_credits": 2,
      "created_at": "2024-11-23T10:00:00Z",
      "completed_at": "2024-11-23T10:00:05Z"
    }
  ],
  "total": 42,
  "limit": 50,
  "offset": 0
}

Confidence Scores

Every extracted field includes a confidence score (0-1) indicating how confident the AI is in the extraction.

Interpreting Confidence

RangeInterpretationRecommendation
0.95-1.0Very HighSafe to auto-process
0.85-0.94HighGenerally reliable
0.70-0.84MediumConsider manual review
0.50-0.69LowRecommend manual review
<0.50Very LowRequire manual review

Using Confidence Scores

function shouldReviewField(field: string, value: any, confidence: number): boolean {
  // Critical financial fields need higher confidence
  const criticalFields = ['total_amount', 'payment_amount'];
  const confidenceThreshold = criticalFields.includes(field) ? 0.90 : 0.70;

  return confidence < confidenceThreshold;
}

// Check each field
const job = await getJob('job_abc123...');
const fieldsToReview = [];

for (const [field, value] of Object.entries(job.extracted_data)) {
  const confidence = job.confidence[field];

  if (shouldReviewField(field, value, confidence)) {
    fieldsToReview.push({ field, value, confidence });
  }
}

if (fieldsToReview.length > 0) {
  console.log('Fields requiring review:', fieldsToReview);
  // Send to manual review queue
} else {
  // Auto-process with high confidence
  await processInvoice(job.extracted_data);
}

Error Handling

Handle different failure scenarios gracefully.

Common Error Codes

CodeDescriptionResolution
document_too_largeFile exceeds 10 MBCompress or split document
document_unreadableCannot extract textImprove scan quality
unsupported_formatInvalid file formatUse supported formats
insufficient_creditsNot enough creditsPurchase more credits
extractor_not_foundInvalid extractor IDCheck extractor exists
rate_limit_exceededToo many requestsReduce request rate

Example Error Handling

async function processDocument(extractorId: string, documentPath: string) {
  try {
    // Submit document
    const job = await submitDocument(extractorId, documentPath);

    // Wait for results
    const results = await waitForResults(job.id);

    return results;
  } catch (error) {
    if (error.code === 'document_unreadable') {
      console.error('Document quality too low. Please rescan.');
      // Notify user to provide better scan
    } else if (error.code === 'insufficient_credits') {
      console.error('Out of credits. Purchase more to continue.');
      // Redirect to billing page
    } else if (error.code === 'rate_limit_exceeded') {
      console.error('Rate limit exceeded. Retrying in 60 seconds...');
      // Implement exponential backoff
      await sleep(60000);
      return processDocument(extractorId, documentPath);
    } else {
      console.error('Unexpected error:', error);
      // Log to error tracking service
    }
  }
}

Metadata

Attach custom metadata to jobs for tracking and organization.

const response = await fetch('https://api.adteco.com/v1/documents', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk_live_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    extractor_id: 'ext_abc123...',
    document: base64Document,
    mime_type: 'application/pdf',
    metadata: {
      // Your custom data
      customer_id: 'cust_123',
      invoice_type: 'recurring',
      source: 'email_attachment',
      processed_by: 'user_456',
      tags: ['urgent', 'large_amount'],
    },
  }),
});

Metadata is returned with job results and included in webhook notifications.

Metadata Limits: Maximum 10 KB per job. Metadata is not indexed or searchable.

Document URLs

Successfully processed documents are stored securely and accessible via the document_url field.

Accessing Documents

const job = await getJob('job_abc123...');

// Document URL is pre-signed and expires after 24 hours
const documentUrl = job.document_url;

// Download the original document
const response = await fetch(documentUrl);
const blob = await response.blob();

// Save locally
const buffer = Buffer.from(await blob.arrayBuffer());
fs.writeFileSync('downloaded_invoice.pdf', buffer);

Security: Document URLs are pre-signed and expire after 24 hours. Generate new URLs by fetching the job again.

Best Practices

Document Quality

For best extraction results:

  • Resolution: Minimum 150 DPI, recommended 300+ DPI
  • Format: PDF is preferred; avoid low-quality JPEGs
  • Orientation: Ensure documents are right-side up
  • Cropping: Include full document pages
  • Clarity: Avoid blurry or distorted scans

Batch Processing

Process multiple documents efficiently:

async function processBatch(extractorId: string, documents: string[]) {
  // Submit all documents in parallel
  const submissions = documents.map(doc =>
    fetch('https://api.adteco.com/v1/documents', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        extractor_id: extractorId,
        document: doc,
        mime_type: 'application/pdf',
      }),
    })
  );

  const jobs = await Promise.all(submissions);

  // Use webhooks to receive results as they complete
  // Or poll for results
  return jobs.map(j => j.id);
}

Cost Optimization

Minimize credit usage:

  • Use test keys during development
  • Test extractors thoroughly before production use
  • Implement quality checks before submission
  • Cache results to avoid reprocessing

Next Steps