How to Automate Invoice Processing with an API
How to Automate Invoice Processing with an API
Invoice processing is one of the most common — and most tedious — document workflows in any business. Someone receives an invoice, opens it, types the vendor name into a spreadsheet, copies the line items, double-checks the total, and files it away. Multiply that by hundreds of invoices per month, and you have a full-time job that adds no value.
Here's how to automate the entire process using an API.
What Gets Extracted from an Invoice
A typical invoice contains a consistent set of data points:
- Vendor information — Name, address, VAT ID, bank details
- Invoice metadata — Invoice number, date, due date, payment terms
- Customer details — Who the invoice is addressed to
- Line items — Description, quantity, unit price, total per item
- Totals — Subtotal, tax rate, tax amount, grand total
- Payment information — IBAN, BIC, payment method
With schema-based extraction, you define exactly which of these fields you need, and the API returns them as clean JSON.
Building an Invoice Extraction Schema
Simple Schema (Key Fields Only)
If you just need the basics for bookkeeping:
{
"type": "object",
"properties": {
"vendor_name": { "type": "string" },
"invoice_number": { "type": "string" },
"date": { "type": "string", "format": "date" },
"due_date": { "type": "string", "format": "date" },
"total": { "type": "number" },
"currency": { "type": "string" }
}
}
Full Schema (Complete Extraction)
For accounts payable automation where you need every detail:
{
"type": "object",
"properties": {
"vendor": {
"type": "object",
"properties": {
"name": { "type": "string" },
"address": { "type": "string" },
"vat_id": { "type": "string" },
"iban": { "type": "string" },
"bic": { "type": "string" }
}
},
"invoice_number": { "type": "string" },
"date": { "type": "string", "format": "date" },
"due_date": { "type": "string", "format": "date" },
"payment_terms": { "type": "string" },
"customer": {
"type": "object",
"properties": {
"name": { "type": "string" },
"address": { "type": "string" }
}
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"unit_price": { "type": "number" },
"vat_rate": { "type": "number" },
"total": { "type": "number" }
}
}
},
"subtotal": { "type": "number" },
"vat_amount": { "type": "number" },
"total": { "type": "number" },
"currency": { "type": "string" }
}
}
Processing an Invoice
1. Register Your Schema
curl -X POST https://api.smole.tech/api/schemas \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "invoice-full",
"schema": { ... }
}'
Save the returned schema ID — you'll use it for every invoice.
2. Upload an Invoice
curl -X POST https://api.smole.tech/api/pipeline/file \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf" \
-F "schemaId=SCHEMA_ID"
This works with PDF invoices (digital or scanned), photographed invoices, Word documents, and even HTML invoices.
3. Retrieve the Extracted Data
curl https://api.smole.tech/api/pipeline/PIPELINE_ID \
-H "Authorization: Bearer YOUR_API_KEY"
Example response:
{
"vendor": {
"name": "TechParts Distribution GmbH",
"address": "Industriestr. 15, 70469 Stuttgart",
"vat_id": "DE298374651",
"iban": "DE89370400440532013000",
"bic": "COBADEFFXXX"
},
"invoice_number": "TP-2025-1847",
"date": "2025-11-20",
"due_date": "2025-12-20",
"payment_terms": "Net 30",
"customer": {
"name": "Your Company GmbH",
"address": "Musterstr. 10, 10115 Berlin"
},
"line_items": [
{ "description": "Server RAM 64GB DDR5", "quantity": 4, "unit_price": 189.00, "vat_rate": 0.19, "total": 756.00 },
{ "description": "NVMe SSD 2TB", "quantity": 2, "unit_price": 245.00, "vat_rate": 0.19, "total": 490.00 },
{ "description": "Network Cable Cat6 (50m)", "quantity": 10, "unit_price": 12.50, "vat_rate": 0.19, "total": 125.00 }
],
"subtotal": 1371.00,
"vat_amount": 260.49,
"total": 1631.49,
"currency": "EUR"
}
Handling Invoice Variations
Invoices vary wildly in format. Some are clean PDFs from accounting software, others are handwritten notes, and everything in between. Schema-based extraction handles this because it understands the content, not the layout.
Different Layouts
The same schema works whether the vendor name is at the top-left, top-right, or in a header. The extraction engine finds the data by context, not by position.
Different Languages
Invoices in German, English, French, or any other language are processed the same way. The field names in your schema are in your preferred language — the extraction maps document content accordingly.
Missing Fields
If a field doesn't exist in the invoice (e.g., no BIC code), the API returns null for that field. Your code should handle optional fields gracefully.
Integrating with Your Systems
Once you have structured JSON, feeding it into your existing tools is straightforward:
Accounting Software
Push extracted data to QuickBooks, Xero, Datev, or Lexware via their APIs. Map Smole's JSON fields to the accounting system's expected format.
ERP Systems
Feed invoice data into SAP, Oracle, or Microsoft Dynamics. The structured JSON maps cleanly to ERP entry formats.
Spreadsheets
For simpler workflows, write extracted data to Google Sheets or Excel via their APIs. Each invoice becomes a row.
Databases
Insert extracted data directly into PostgreSQL, MySQL, or any database. The JSON structure maps naturally to relational tables.
Batch Processing Invoices
For high-volume scenarios — processing a month's worth of invoices at once:
async function processInvoiceBatch(files, schemaId) {
// Submit all invoices
const pipelines = await Promise.all(
files.map(file =>
submitPipeline(file, schemaId)
)
);
// Poll for results
const results = await Promise.all(
pipelines.map(p => pollForResult(p.id))
);
return results;
}
Smole handles concurrent requests, so batch processing is efficient even at scale.
Cost of Manual vs Automated Processing
| Manual | Automated | |
|---|---|---|
| Time per invoice | 15-30 minutes | Seconds |
| Error rate | 2-5% | Near zero |
| Scales with volume | No (need more people) | Yes (same API) |
| Works after hours | No | Yes |
For a company processing 200 invoices per month at 20 minutes each, that's nearly 67 hours of manual work per month — almost a full-time position.
Try It Now
Upload an invoice in the Playground to see extraction results instantly. Define your schema, drop in a PDF, and get JSON back in seconds.
For full API integration details, see the documentation.
Related articles
Batch Document Processing: Process Hundreds of Files via API
Process large volumes of documents at scale using a REST API. Batch extract data from invoices, contracts, forms, and reports with parallel processing and error handling.
automationAutomating Document Workflows with Smole API
Learn how to build automated document processing pipelines that scale with your business.
pdfHow to Convert PDFs to JSON with an API
A practical guide to converting PDF documents into structured JSON data using a REST API. Covers digital PDFs, scanned documents, and batch processing.
