Document Types
Understanding document classification and supported document types
Document Types
The Email Ingestion service automatically classifies incoming documents into specific types for appropriate processing and NetSuite record creation.
Supported Types
Vendor Bill
NetSuite Record: vendorbill
A vendor bill is an invoice FROM a vendor/supplier TO your company. You are the buyer receiving this invoice and will need to pay it.
Key Indicators:
- "Invoice" or "Bill" in header
- Your company listed as "Bill To" or "Ship To"
- Supplier/vendor letterhead and branding
- Payment terms (Net 30, Due on Receipt, etc.)
- Items or services you purchased
Extracted Fields:
| Field | Description | Required |
|---|---|---|
vendorName | Name of the vendor/supplier | Yes |
invoiceNumber | Vendor's invoice number | Yes |
invoiceDate | Date on the invoice | Yes |
dueDate | Payment due date | No |
poNumber | Related purchase order number | No |
subtotal | Amount before tax | Yes |
taxAmount | Tax amount | No |
total | Total amount due | Yes |
currency | Currency code (USD, EUR, etc.) | No |
lineItems | Individual line items | No |
Line Item Fields:
interface VendorBillLineItem {
itemName: string;
description?: string;
quantity: number;
rate: number;
amount: number;
taxCode?: string;
}Customer Invoice
NetSuite Record: invoice
A customer invoice is an invoice FROM your company TO a customer. You are the seller billing the customer.
Key Indicators:
- Your company logo and letterhead
- Customer name in "Bill To" section
- Your terms and conditions
- Your bank details for payment
Extracted Fields:
| Field | Description | Required |
|---|---|---|
customerName | Name of the customer | Yes |
invoiceNumber | Your invoice number | Yes |
invoiceDate | Date you issued the invoice | Yes |
dueDate | When payment is expected | No |
subtotal | Amount before tax | Yes |
taxAmount | Tax charged | No |
total | Total amount billed | Yes |
lineItems | Products/services sold | No |
Purchase Order
NetSuite Record: purchaseorder
A purchase order is a request to purchase goods or services. This is NOT an invoice - it's created before the purchase is fulfilled.
Key Indicators:
- "Purchase Order" or "PO" prominently displayed
- PO number (e.g., "PO-2024-001")
- Delivery/shipping instructions
- No payment terms (yet)
- Status indicators (Draft, Pending, Approved)
Extracted Fields:
| Field | Description | Required |
|---|---|---|
poNumber | Purchase order number | Yes |
vendorName | Vendor to order from | Yes |
orderDate | Date PO was created | Yes |
expectedDate | Expected delivery date | No |
shipToAddress | Delivery address | No |
subtotal | Order subtotal | Yes |
total | Order total | Yes |
lineItems | Items being ordered | Yes |
Expense Report
NetSuite Record: expensereport
An expense report is a collection of receipts and expenses submitted by an employee for reimbursement.
Key Indicators:
- Multiple receipts or transactions
- Employee name and department
- Expense categories (Travel, Meals, Office Supplies)
- Reimbursement request form
- Approval signatures
Extracted Fields:
| Field | Description | Required |
|---|---|---|
employeeName | Employee submitting expenses | Yes |
reportDate | Date of submission | Yes |
reportPeriod | Period covered (e.g., "Nov 2024") | No |
totalAmount | Total reimbursement amount | Yes |
expenseItems | Individual expenses | Yes |
Expense Item Fields:
interface ExpenseItem {
date: string;
merchant: string;
category: string; // 'travel' | 'meals' | 'supplies' | 'other'
description?: string;
amount: number;
currency: string;
receiptAttached: boolean;
}Receipt
No NetSuite Record (may attach to expense report)
A receipt is a simple proof of purchase, typically from a retail transaction.
Key Indicators:
- Single transaction
- Point-of-sale format
- Store name and address
- Date, time, and transaction ID
- Payment method shown
Extracted Fields:
| Field | Description | Required |
|---|---|---|
merchantName | Store/vendor name | Yes |
transactionDate | Date of purchase | Yes |
transactionTime | Time of purchase | No |
subtotal | Amount before tax | No |
tax | Tax amount | No |
total | Total paid | Yes |
paymentMethod | Cash, Card ending in XXXX | No |
items | Items purchased | No |
General Request
No NetSuite Record
This category is used when:
- The email has no attachments
- The document type cannot be determined
- The confidence score is too low
- The document doesn't match any known type
Handling:
- Sent to the review queue
- No automatic processing
- Manual classification required
Classification Process
AI Classification Pipeline
flowchart TD
A[Email Received] --> B{Has Attachments?}
B -->|No| C[General Request]
B -->|Yes| D[Extract Document]
D --> E[Email Metadata Analysis]
E --> F[Document Content Analysis]
F --> G[AI Classification]
G --> H{Confidence >= 0.7?}
H -->|Yes| I[Assign Document Type]
H -->|No| J[Human Review Queue]
I --> K[Route to Extractor]Metadata Analysis
The classifier first analyzes email metadata:
// Quick classification from email only
const quickResult = classificationService.classifyByEmailMetadata({
subject: "Invoice #12345 from Acme Corp",
senderEmail: "billing@acme.com",
hasAttachments: true
});
// Returns: { documentType: 'vendor_bill', confidence: 0.6 }Pattern Matching:
| Pattern | Inferred Type | Confidence |
|---|---|---|
| Subject contains "invoice" or "bill" | vendor_bill | 0.6 |
| Subject contains "purchase order" | purchase_order | 0.7 |
| Subject contains "expense" | expense_report | 0.6 |
| Sender contains "billing@" | vendor_bill | 0.5 |
Content Analysis
For higher confidence, the AI analyzes document content:
const result = await classificationService.classify({
document: pdfBuffer,
mimeType: 'application/pdf',
emailSubject: subject,
emailBody: body,
senderEmail: sender,
orgId: 'org_123'
});Analysis includes:
- Document layout and structure
- Header text and titles
- Key phrases and terminology
- Address blocks (Bill To vs Ship To)
- Amount patterns and totals
- Company logos and branding
Confidence Scoring
| Score Range | Meaning | Action |
|---|---|---|
| 0.9 - 1.0 | Very high confidence | Auto-process |
| 0.7 - 0.9 | High confidence | Process with flag |
| 0.5 - 0.7 | Medium confidence | Review recommended |
| 0.0 - 0.5 | Low confidence | Human review required |
Distinguishing Similar Documents
Vendor Bill vs Customer Invoice
The key distinction is direction of money flow:
| Aspect | Vendor Bill | Customer Invoice |
|---|---|---|
| You are | The buyer | The seller |
| Bill To | Your company | Customer |
| Letterhead | Vendor's | Your company's |
| Action | You pay them | They pay you |
Classification Hints:
- Check which company's logo/letterhead appears
- Look at the "Bill To" address
- Identify who is requesting payment
Vendor Bill vs Purchase Order
| Aspect | Vendor Bill | Purchase Order |
|---|---|---|
| Status | After purchase | Before purchase |
| Purpose | Request payment | Request goods |
| Contains | Payment terms | Delivery dates |
| Created by | Vendor | Buyer |
Receipt vs Vendor Bill
| Aspect | Receipt | Vendor Bill |
|---|---|---|
| Format | Point-of-sale | Business document |
| Items | Retail products | Services/supplies |
| Payment | Already paid | Payment due |
| Detail | Minimal | Comprehensive |
Custom Document Types
For specialized documents, create custom extractors:
// Custom document type
const customExtractor = await fetch('/api/extractors', {
method: 'POST',
body: JSON.stringify({
name: 'Contract Agreement',
documentType: 'custom',
fieldDefinitions: [
{ name: 'contractNumber', type: 'text', required: true },
{ name: 'parties', type: 'array', required: true },
{ name: 'effectiveDate', type: 'date', required: true },
{ name: 'termLength', type: 'text', required: false }
]
})
});Best Practices
Improving Classification Accuracy
- Use clear email subjects: Include document type keywords
- Maintain sender consistency: Consistent vendor email addresses
- High-quality documents: Clear, readable scans
- Configure overrides: Known senders with known document types
Handling Low Confidence
When classification confidence is low:
- Document goes to review queue
- User manually selects document type
- System learns from corrections
- Future similar documents classified better
Multi-Document Emails
If an email contains multiple document types:
// Each attachment is classified independently
{
emailId: 'email_123',
attachments: [
{ filename: 'invoice.pdf', documentType: 'vendor_bill', confidence: 0.92 },
{ filename: 'receipt.jpg', documentType: 'receipt', confidence: 0.88 }
]
}