My App
DocExtract API

Extractors

Create and manage extraction templates for document processing

Extractors

Extractors are reusable templates that define what data to extract from your documents. Each extractor specifies the document type and the fields you want to extract.

The Extractor Object

{
  "id": "ext_abc123...",
  "org_id": "org_xyz789...",
  "name": "Invoice Extractor",
  "document_type": "invoice",
  "field_definitions": [
    {
      "name": "invoice_number",
      "type": "text",
      "required": true,
      "description": "The unique invoice identifier"
    },
    {
      "name": "total_amount",
      "type": "currency",
      "required": true,
      "description": "Total amount due"
    }
  ],
  "is_active": true,
  "created_at": "2024-11-23T10:00:00Z",
  "updated_at": "2024-11-23T10:00:00Z",
  "created_by": "user_123",
  "usage_count": 42
}

Attributes

AttributeTypeDescription
idstringUnique identifier for the extractor
org_idstringOrganization ID that owns this extractor
namestringDisplay name for the extractor
document_typestringType of document this extractor processes
field_definitionsarrayArray of field definition objects
is_activebooleanWhether the extractor is active
created_atstringISO 8601 timestamp of creation
updated_atstringISO 8601 timestamp of last update
created_bystringUser ID who created this extractor
usage_countnumberNumber of documents processed with this extractor

Field Definitions

Each field definition specifies a data point to extract from documents.

Field Types

TypeDescriptionExample
textPlain text string"ACME Corporation"
numberNumeric value42
currencyMonetary amount1250.50
dateDate value (ISO 8601)"2024-11-23"
emailEmail address"contact@acme.com"
phonePhone number"+1-555-0123"
booleanTrue/false valuetrue
arrayList of values["item1", "item2"]
objectNested object{"key": "value"}

Field Attributes

AttributeTypeRequiredDescription
namestringYesField identifier (snake_case recommended)
typestringYesField type (see types above)
requiredbooleanYesWhether this field must be extracted
descriptionstringYesDescription to guide AI extraction
validationobjectNoValidation rules (regex, min/max, etc.)
default_valueanyNoDefault value if not found

Validation Rules

Add validation to ensure extracted data meets your requirements:

{
  "name": "invoice_number",
  "type": "text",
  "required": true,
  "description": "Invoice number starting with INV-",
  "validation": {
    "pattern": "^INV-[0-9]{4,}$",
    "min_length": 8,
    "max_length": 20
  }
}

Available Validation Rules

Text Fields:

  • pattern: Regex pattern
  • min_length: Minimum string length
  • max_length: Maximum string length

Number/Currency Fields:

  • min: Minimum value
  • max: Maximum value

Date Fields:

  • min_date: Earliest allowed date
  • max_date: Latest allowed date
  • format: Expected date format

Email Fields:

  • domain: Required email domain (e.g., "@company.com")

Create an Extractor

Create a new extraction template.

curl -X POST https://api.adteco.com/v1/extractors \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Purchase Order Extractor",
    "document_type": "purchase_order",
    "field_definitions": [
      {
        "name": "po_number",
        "type": "text",
        "required": true,
        "description": "Purchase order number"
      },
      {
        "name": "order_date",
        "type": "date",
        "required": true,
        "description": "Date the PO was created"
      },
      {
        "name": "total_amount",
        "type": "currency",
        "required": true,
        "description": "Total order amount"
      },
      {
        "name": "line_items",
        "type": "array",
        "required": false,
        "description": "Array of items ordered"
      }
    ]
  }'
import requests

response = requests.post(
    'https://api.adteco.com/v1/extractors',
    headers={
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'name': 'Purchase Order Extractor',
        'document_type': 'purchase_order',
        'field_definitions': [
            {
                'name': 'po_number',
                'type': 'text',
                'required': True,
                'description': 'Purchase order number',
            },
            {
                'name': 'order_date',
                'type': 'date',
                'required': True,
                'description': 'Date the PO was created',
            },
            {
                'name': 'total_amount',
                'type': 'currency',
                'required': True,
                'description': 'Total order amount',
            },
            {
                'name': 'line_items',
                'type': 'array',
                'required': False,
                'description': 'Array of items ordered',
            },
        ],
    },
)

extractor = response.json()
print('Extractor created:', extractor['id'])

Response

{
  "id": "ext_abc123...",
  "org_id": "org_xyz789...",
  "name": "Purchase Order Extractor",
  "document_type": "purchase_order",
  "field_definitions": [...],
  "is_active": true,
  "created_at": "2024-11-23T10:00:00Z",
  "updated_at": "2024-11-23T10:00:00Z",
  "created_by": "user_123",
  "usage_count": 0
}

List All Extractors

Retrieve all extractors for your organization.

curl -X GET "https://api.adteco.com/v1/extractors?document_type=invoice&is_active=true" \
  -H "Authorization: Bearer sk_live_your_api_key"
params = {
    'document_type': 'invoice',
    'is_active': 'true',
}

response = requests.get(
    'https://api.adteco.com/v1/extractors',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
    params=params,
)

data = response.json()
print(f"Found {len(data['extractors'])} extractors")

Query Parameters

ParameterTypeDescription
document_typestringFilter by document type
is_activebooleanFilter by active status
limitnumberMax results (default: 50, max: 100)
offsetnumberResults offset for pagination

Response

{
  "extractors": [
    {
      "id": "ext_abc123...",
      "name": "Invoice Extractor",
      "document_type": "invoice",
      "field_definitions": [...],
      "is_active": true,
      "created_at": "2024-11-23T10:00:00Z",
      "usage_count": 42
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

Get Extractor by ID

Retrieve a specific extractor.

curl -X GET https://api.adteco.com/v1/extractors/ext_abc123... \
  -H "Authorization: Bearer sk_live_your_api_key"
extractor_id = 'ext_abc123...'

response = requests.get(
    f'https://api.adteco.com/v1/extractors/{extractor_id}',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
)

extractor = response.json()

Response

Returns the full extractor object (see The Extractor Object).

Update an Extractor

Update an existing extractor. Only provided fields will be updated.

curl -X PATCH https://api.adteco.com/v1/extractors/ext_abc123... \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Updated Invoice Extractor",
    "field_definitions": [
      {
        "name": "invoice_number",
        "type": "text",
        "required": true,
        "description": "Invoice number"
      },
      {
        "name": "due_date",
        "type": "date",
        "required": true,
        "description": "Payment due date"
      }
    ]
  }'
response = requests.patch(
    'https://api.adteco.com/v1/extractors/ext_abc123...',
    headers={
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'name': 'Updated Invoice Extractor',
        'field_definitions': [
            {
                'name': 'invoice_number',
                'type': 'text',
                'required': True,
                'description': 'Invoice number',
            },
            {
                'name': 'due_date',
                'type': 'date',
                'required': True,
                'description': 'Payment due date',
            },
        ],
    },
)

extractor = response.json()

Warning: Changing field definitions will affect future document processing jobs. Existing jobs are not affected.

Response

Returns the updated extractor object.

Duplicate an Extractor

Create a copy of an existing extractor.

curl -X POST https://api.adteco.com/v1/extractors/ext_abc123.../duplicate \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "new_name": "Invoice Extractor (Copy)"
  }'
response = requests.post(
    'https://api.adteco.com/v1/extractors/ext_abc123.../duplicate',
    headers={
        'Authorization': 'Bearer sk_live_your_api_key',
        'Content-Type': 'application/json',
    },
    json={
        'new_name': 'Invoice Extractor (Copy)',
    },
)

duplicated_extractor = response.json()

Response

Returns the new duplicated extractor with a new ID.

Delete an Extractor

Permanently delete an extractor.

curl -X DELETE https://api.adteco.com/v1/extractors/ext_abc123... \
  -H "Authorization: Bearer sk_live_your_api_key"
response = requests.delete(
    'https://api.adteco.com/v1/extractors/ext_abc123...',
    headers={'Authorization': 'Bearer sk_live_your_api_key'},
)

if response.status_code == 204:
    print('Extractor deleted successfully')

Warning: This action is irreversible. Deleting an extractor will not affect existing jobs, but you won't be able to process new documents with this template.

Response

Returns 204 No Content on success.

Common Document Types

Pre-defined document types with recommended fields:

Invoice

{
  "document_type": "invoice",
  "field_definitions": [
    {
      "name": "invoice_number",
      "type": "text",
      "required": true,
      "description": "Unique invoice identifier"
    },
    {
      "name": "invoice_date",
      "type": "date",
      "required": true,
      "description": "Invoice issue date"
    },
    {
      "name": "due_date",
      "type": "date",
      "required": false,
      "description": "Payment due date"
    },
    {
      "name": "vendor_name",
      "type": "text",
      "required": true,
      "description": "Vendor or supplier name"
    },
    {
      "name": "vendor_address",
      "type": "text",
      "required": false,
      "description": "Vendor address"
    },
    {
      "name": "subtotal",
      "type": "currency",
      "required": false,
      "description": "Subtotal before tax"
    },
    {
      "name": "tax_amount",
      "type": "currency",
      "required": false,
      "description": "Total tax amount"
    },
    {
      "name": "total_amount",
      "type": "currency",
      "required": true,
      "description": "Total amount due"
    },
    {
      "name": "line_items",
      "type": "array",
      "required": false,
      "description": "Array of line items with description, quantity, price"
    }
  ]
}

Receipt

{
  "document_type": "receipt",
  "field_definitions": [
    {
      "name": "merchant_name",
      "type": "text",
      "required": true,
      "description": "Name of the merchant"
    },
    {
      "name": "transaction_date",
      "type": "date",
      "required": true,
      "description": "Date of purchase"
    },
    {
      "name": "total_amount",
      "type": "currency",
      "required": true,
      "description": "Total amount paid"
    },
    {
      "name": "payment_method",
      "type": "text",
      "required": false,
      "description": "Payment method used (e.g., Visa, Cash)"
    },
    {
      "name": "items",
      "type": "array",
      "required": false,
      "description": "List of purchased items"
    }
  ]
}

Purchase Order

{
  "document_type": "purchase_order",
  "field_definitions": [
    {
      "name": "po_number",
      "type": "text",
      "required": true,
      "description": "Purchase order number"
    },
    {
      "name": "order_date",
      "type": "date",
      "required": true,
      "description": "Date the PO was created"
    },
    {
      "name": "delivery_date",
      "type": "date",
      "required": false,
      "description": "Expected delivery date"
    },
    {
      "name": "supplier_name",
      "type": "text",
      "required": true,
      "description": "Supplier name"
    },
    {
      "name": "buyer_name",
      "type": "text",
      "required": false,
      "description": "Buyer or requester name"
    },
    {
      "name": "total_amount",
      "type": "currency",
      "required": true,
      "description": "Total order amount"
    },
    {
      "name": "line_items",
      "type": "array",
      "required": false,
      "description": "Ordered items with quantities and prices"
    }
  ]
}

Best Practices

Field Descriptions

Write clear, descriptive field descriptions to improve extraction accuracy:

Good:

{
  "name": "invoice_number",
  "description": "The unique invoice identifier, typically starting with 'INV-' followed by numbers"
}

Bad:

{
  "name": "invoice_number",
  "description": "Invoice number"
}

Required vs Optional Fields

  • Mark fields as required: true only if they're critical for your use case
  • Optional fields give the AI flexibility and reduce failures
  • Consider making most fields optional and handling missing data in your application

Field Naming

  • Use snake_case for field names
  • Use descriptive, unambiguous names
  • Avoid abbreviations unless they're standard (e.g., "PO" for Purchase Order)

Testing Extractors

Before using in production:

  1. Test with 5-10 sample documents
  2. Check confidence scores for all fields
  3. Verify edge cases (missing data, poor quality scans)
  4. Iterate on field descriptions based on results

Next Steps