Extractors
Create and manage extraction templates for document processing
Extractors
Extractors are reusable templates that define what data to extract from your documents. Each extractor specifies the document type and the fields you want to extract.
The Extractor Object
{
"id": "ext_abc123...",
"org_id": "org_xyz789...",
"name": "Invoice Extractor",
"document_type": "invoice",
"field_definitions": [
{
"name": "invoice_number",
"type": "text",
"required": true,
"description": "The unique invoice identifier"
},
{
"name": "total_amount",
"type": "currency",
"required": true,
"description": "Total amount due"
}
],
"is_active": true,
"created_at": "2024-11-23T10:00:00Z",
"updated_at": "2024-11-23T10:00:00Z",
"created_by": "user_123",
"usage_count": 42
}Attributes
| Attribute | Type | Description |
|---|---|---|
id | string | Unique identifier for the extractor |
org_id | string | Organization ID that owns this extractor |
name | string | Display name for the extractor |
document_type | string | Type of document this extractor processes |
field_definitions | array | Array of field definition objects |
is_active | boolean | Whether the extractor is active |
created_at | string | ISO 8601 timestamp of creation |
updated_at | string | ISO 8601 timestamp of last update |
created_by | string | User ID who created this extractor |
usage_count | number | Number of documents processed with this extractor |
Field Definitions
Each field definition specifies a data point to extract from documents.
Field Types
| Type | Description | Example |
|---|---|---|
text | Plain text string | "ACME Corporation" |
number | Numeric value | 42 |
currency | Monetary amount | 1250.50 |
date | Date value (ISO 8601) | "2024-11-23" |
email | Email address | "contact@acme.com" |
phone | Phone number | "+1-555-0123" |
boolean | True/false value | true |
array | List of values | ["item1", "item2"] |
object | Nested object | {"key": "value"} |
Field Attributes
| Attribute | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Field identifier (snake_case recommended) |
type | string | Yes | Field type (see types above) |
required | boolean | Yes | Whether this field must be extracted |
description | string | Yes | Description to guide AI extraction |
validation | object | No | Validation rules (regex, min/max, etc.) |
default_value | any | No | Default value if not found |
Validation Rules
Add validation to ensure extracted data meets your requirements:
{
"name": "invoice_number",
"type": "text",
"required": true,
"description": "Invoice number starting with INV-",
"validation": {
"pattern": "^INV-[0-9]{4,}$",
"min_length": 8,
"max_length": 20
}
}Available Validation Rules
Text Fields:
pattern: Regex patternmin_length: Minimum string lengthmax_length: Maximum string length
Number/Currency Fields:
min: Minimum valuemax: Maximum value
Date Fields:
min_date: Earliest allowed datemax_date: Latest allowed dateformat: Expected date format
Email Fields:
domain: Required email domain (e.g., "@company.com")
Create an Extractor
Create a new extraction template.
curl -X POST https://api.adteco.com/v1/extractors \
-H "Authorization: Bearer sk_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Purchase Order Extractor",
"document_type": "purchase_order",
"field_definitions": [
{
"name": "po_number",
"type": "text",
"required": true,
"description": "Purchase order number"
},
{
"name": "order_date",
"type": "date",
"required": true,
"description": "Date the PO was created"
},
{
"name": "total_amount",
"type": "currency",
"required": true,
"description": "Total order amount"
},
{
"name": "line_items",
"type": "array",
"required": false,
"description": "Array of items ordered"
}
]
}'import requests
response = requests.post(
'https://api.adteco.com/v1/extractors',
headers={
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
json={
'name': 'Purchase Order Extractor',
'document_type': 'purchase_order',
'field_definitions': [
{
'name': 'po_number',
'type': 'text',
'required': True,
'description': 'Purchase order number',
},
{
'name': 'order_date',
'type': 'date',
'required': True,
'description': 'Date the PO was created',
},
{
'name': 'total_amount',
'type': 'currency',
'required': True,
'description': 'Total order amount',
},
{
'name': 'line_items',
'type': 'array',
'required': False,
'description': 'Array of items ordered',
},
],
},
)
extractor = response.json()
print('Extractor created:', extractor['id'])Response
{
"id": "ext_abc123...",
"org_id": "org_xyz789...",
"name": "Purchase Order Extractor",
"document_type": "purchase_order",
"field_definitions": [...],
"is_active": true,
"created_at": "2024-11-23T10:00:00Z",
"updated_at": "2024-11-23T10:00:00Z",
"created_by": "user_123",
"usage_count": 0
}List All Extractors
Retrieve all extractors for your organization.
curl -X GET "https://api.adteco.com/v1/extractors?document_type=invoice&is_active=true" \
-H "Authorization: Bearer sk_live_your_api_key"params = {
'document_type': 'invoice',
'is_active': 'true',
}
response = requests.get(
'https://api.adteco.com/v1/extractors',
headers={'Authorization': 'Bearer sk_live_your_api_key'},
params=params,
)
data = response.json()
print(f"Found {len(data['extractors'])} extractors")Query Parameters
| Parameter | Type | Description |
|---|---|---|
document_type | string | Filter by document type |
is_active | boolean | Filter by active status |
limit | number | Max results (default: 50, max: 100) |
offset | number | Results offset for pagination |
Response
{
"extractors": [
{
"id": "ext_abc123...",
"name": "Invoice Extractor",
"document_type": "invoice",
"field_definitions": [...],
"is_active": true,
"created_at": "2024-11-23T10:00:00Z",
"usage_count": 42
}
],
"total": 1,
"limit": 50,
"offset": 0
}Get Extractor by ID
Retrieve a specific extractor.
curl -X GET https://api.adteco.com/v1/extractors/ext_abc123... \
-H "Authorization: Bearer sk_live_your_api_key"extractor_id = 'ext_abc123...'
response = requests.get(
f'https://api.adteco.com/v1/extractors/{extractor_id}',
headers={'Authorization': 'Bearer sk_live_your_api_key'},
)
extractor = response.json()Response
Returns the full extractor object (see The Extractor Object).
Update an Extractor
Update an existing extractor. Only provided fields will be updated.
curl -X PATCH https://api.adteco.com/v1/extractors/ext_abc123... \
-H "Authorization: Bearer sk_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Updated Invoice Extractor",
"field_definitions": [
{
"name": "invoice_number",
"type": "text",
"required": true,
"description": "Invoice number"
},
{
"name": "due_date",
"type": "date",
"required": true,
"description": "Payment due date"
}
]
}'response = requests.patch(
'https://api.adteco.com/v1/extractors/ext_abc123...',
headers={
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
json={
'name': 'Updated Invoice Extractor',
'field_definitions': [
{
'name': 'invoice_number',
'type': 'text',
'required': True,
'description': 'Invoice number',
},
{
'name': 'due_date',
'type': 'date',
'required': True,
'description': 'Payment due date',
},
],
},
)
extractor = response.json()Warning: Changing field definitions will affect future document processing jobs. Existing jobs are not affected.
Response
Returns the updated extractor object.
Duplicate an Extractor
Create a copy of an existing extractor.
curl -X POST https://api.adteco.com/v1/extractors/ext_abc123.../duplicate \
-H "Authorization: Bearer sk_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"new_name": "Invoice Extractor (Copy)"
}'response = requests.post(
'https://api.adteco.com/v1/extractors/ext_abc123.../duplicate',
headers={
'Authorization': 'Bearer sk_live_your_api_key',
'Content-Type': 'application/json',
},
json={
'new_name': 'Invoice Extractor (Copy)',
},
)
duplicated_extractor = response.json()Response
Returns the new duplicated extractor with a new ID.
Delete an Extractor
Permanently delete an extractor.
curl -X DELETE https://api.adteco.com/v1/extractors/ext_abc123... \
-H "Authorization: Bearer sk_live_your_api_key"response = requests.delete(
'https://api.adteco.com/v1/extractors/ext_abc123...',
headers={'Authorization': 'Bearer sk_live_your_api_key'},
)
if response.status_code == 204:
print('Extractor deleted successfully')Warning: This action is irreversible. Deleting an extractor will not affect existing jobs, but you won't be able to process new documents with this template.
Response
Returns 204 No Content on success.
Common Document Types
Pre-defined document types with recommended fields:
Invoice
{
"document_type": "invoice",
"field_definitions": [
{
"name": "invoice_number",
"type": "text",
"required": true,
"description": "Unique invoice identifier"
},
{
"name": "invoice_date",
"type": "date",
"required": true,
"description": "Invoice issue date"
},
{
"name": "due_date",
"type": "date",
"required": false,
"description": "Payment due date"
},
{
"name": "vendor_name",
"type": "text",
"required": true,
"description": "Vendor or supplier name"
},
{
"name": "vendor_address",
"type": "text",
"required": false,
"description": "Vendor address"
},
{
"name": "subtotal",
"type": "currency",
"required": false,
"description": "Subtotal before tax"
},
{
"name": "tax_amount",
"type": "currency",
"required": false,
"description": "Total tax amount"
},
{
"name": "total_amount",
"type": "currency",
"required": true,
"description": "Total amount due"
},
{
"name": "line_items",
"type": "array",
"required": false,
"description": "Array of line items with description, quantity, price"
}
]
}Receipt
{
"document_type": "receipt",
"field_definitions": [
{
"name": "merchant_name",
"type": "text",
"required": true,
"description": "Name of the merchant"
},
{
"name": "transaction_date",
"type": "date",
"required": true,
"description": "Date of purchase"
},
{
"name": "total_amount",
"type": "currency",
"required": true,
"description": "Total amount paid"
},
{
"name": "payment_method",
"type": "text",
"required": false,
"description": "Payment method used (e.g., Visa, Cash)"
},
{
"name": "items",
"type": "array",
"required": false,
"description": "List of purchased items"
}
]
}Purchase Order
{
"document_type": "purchase_order",
"field_definitions": [
{
"name": "po_number",
"type": "text",
"required": true,
"description": "Purchase order number"
},
{
"name": "order_date",
"type": "date",
"required": true,
"description": "Date the PO was created"
},
{
"name": "delivery_date",
"type": "date",
"required": false,
"description": "Expected delivery date"
},
{
"name": "supplier_name",
"type": "text",
"required": true,
"description": "Supplier name"
},
{
"name": "buyer_name",
"type": "text",
"required": false,
"description": "Buyer or requester name"
},
{
"name": "total_amount",
"type": "currency",
"required": true,
"description": "Total order amount"
},
{
"name": "line_items",
"type": "array",
"required": false,
"description": "Ordered items with quantities and prices"
}
]
}Best Practices
Field Descriptions
Write clear, descriptive field descriptions to improve extraction accuracy:
Good:
{
"name": "invoice_number",
"description": "The unique invoice identifier, typically starting with 'INV-' followed by numbers"
}Bad:
{
"name": "invoice_number",
"description": "Invoice number"
}Required vs Optional Fields
- Mark fields as
required: trueonly if they're critical for your use case - Optional fields give the AI flexibility and reduce failures
- Consider making most fields optional and handling missing data in your application
Field Naming
- Use
snake_casefor field names - Use descriptive, unambiguous names
- Avoid abbreviations unless they're standard (e.g., "PO" for Purchase Order)
Testing Extractors
Before using in production:
- Test with 5-10 sample documents
- Check confidence scores for all fields
- Verify edge cases (missing data, poor quality scans)
- Iterate on field descriptions based on results