Building Effective JSON Schemas for Invoice Extraction
Building Effective JSON Schemas for Invoice Extraction
The quality of your JSON schema directly impacts extraction accuracy. A well-designed schema acts as a guide for the AI, helping it understand exactly what data you need and where to find it.
Schema Design Principles
1. Use Descriptive Field Names
Field names serve as hints for the AI. Compare these two approaches:
Less effective:
{
"f1": "string",
"f2": "number",
"f3": "string"
}
More effective:
{
"vendor_name": "string",
"total_amount": "number",
"invoice_date": "string"
}
The AI uses these names to understand what you're looking for.
2. Match Your Document Structure
If invoices have line items, your schema should have line items:
{
"invoice_number": "string",
"vendor": {
"name": "string",
"address": "string",
"tax_id": "string"
},
"line_items": [
{
"description": "string",
"quantity": "number",
"unit_price": "number",
"total": "number"
}
],
"subtotal": "number",
"tax": "number",
"total": "number"
}
3. Be Specific About Data Types
- Use
"number"for amounts, quantities, and percentages - Use
"string"for text, dates, and IDs - Use arrays
[]for repeating items - Use nested objects
{}for grouped data
Common Invoice Schema Patterns
Basic Invoice
For simple extraction needs:
{
"invoice_number": "string",
"date": "string",
"vendor_name": "string",
"total_amount": "number"
}
Detailed Invoice
For comprehensive extraction:
{
"invoice_number": "string",
"invoice_date": "string",
"due_date": "string",
"vendor": {
"name": "string",
"address": "string",
"phone": "string",
"email": "string"
},
"customer": {
"name": "string",
"address": "string"
},
"line_items": [
{
"sku": "string",
"description": "string",
"quantity": "number",
"unit_price": "number",
"discount": "number",
"total": "number"
}
],
"subtotal": "number",
"discount_total": "number",
"tax_rate": "number",
"tax_amount": "number",
"shipping": "number",
"total": "number",
"payment_terms": "string",
"notes": "string"
}
Tips for Better Accuracy
- Start minimal, then expand - Begin with essential fields and add more as needed
- Test with real documents - Use actual invoices from your workflow
- Handle variations - Some invoices may not have all fields
- Use consistent naming - Stick to snake_case or camelCase throughout
Handling Edge Cases
Not all invoices are created equal. Some tips:
- Missing fields: The AI will return
nullfor fields it can't find - Multiple formats: The same schema works across different invoice layouts
- Handwritten notes: AI extraction handles handwriting better than traditional OCR
Try It Yourself
Head to the Playground to test your schemas with real documents. You can iterate quickly and see exactly what the AI extracts.
Related articles
JSON Schema Guide for Document Extraction
Everything you need to know about designing JSON Schemas for document data extraction. Field naming, data types, nested objects, arrays, and real-world schema patterns.
invoicesHow to Automate Invoice Processing with an API
Step-by-step guide to automating invoice data extraction. Extract vendor details, line items, totals, and VAT from invoices into structured JSON using a REST API.
pdfHow to Convert PDFs to JSON with an API
A practical guide to converting PDF documents into structured JSON data using a REST API. Covers digital PDFs, scanned documents, and batch processing.
