Receipt OCR API: Extract Data from Receipts Automatically
Receipt OCR API: Extract Data from Receipts Automatically
Receipts are small, crumpled, faded, and full of valuable data. Whether you're building an expense management app, automating bookkeeping, or tracking business purchases, extracting data from receipts is a common need — and a surprisingly hard one.
Traditional OCR reads the text but doesn't understand it. Schema-based extraction reads the text and turns it into structured data you can use directly.
The Challenge with Receipts
Receipts are harder than most documents because:
- Small text — Thermal printers use tiny fonts
- Faded ink — Thermal paper degrades over time
- Varying layouts — Every store has a different format
- Abbreviations — "CHKN BRST" instead of "Chicken Breast"
- Crumpled and skewed — Receipts get folded, wrinkled, and photographed at angles
- Multiple languages — Store names and items in local languages
Despite all this, schema-based extraction produces reliable results because it understands the structure of a receipt, not just the characters.
Receipt Extraction Schema
Basic Schema
For expense tracking — just the key fields:
{
"type": "object",
"properties": {
"store_name": { "type": "string" },
"date": { "type": "string", "format": "date" },
"total": { "type": "number" },
"currency": { "type": "string" },
"payment_method": { "type": "string" }
}
}
Detailed Schema
For full receipt digitization with line items:
{
"type": "object",
"properties": {
"store": {
"type": "object",
"properties": {
"name": { "type": "string" },
"address": { "type": "string" },
"phone": { "type": "string" },
"tax_id": { "type": "string" }
}
},
"date": { "type": "string", "format": "date" },
"time": { "type": "string" },
"receipt_number": { "type": "string" },
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"quantity": { "type": "number" },
"unit_price": { "type": "number" },
"total": { "type": "number" }
}
}
},
"subtotal": { "type": "number" },
"tax_rate": { "type": "number" },
"tax_amount": { "type": "number" },
"total": { "type": "number" },
"payment_method": { "type": "string" },
"currency": { "type": "string" }
}
}
Example: Grocery Receipt
Upload a photo of a grocery receipt:
curl -X POST https://api.smole.tech/api/pipeline/file \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@grocery-receipt.jpg" \
-F "schemaId=YOUR_SCHEMA_ID"
Get structured data:
{
"store": {
"name": "REWE City",
"address": "Friedrichstr. 67, 10117 Berlin",
"phone": null,
"tax_id": "DE136695976"
},
"date": "2025-12-18",
"time": "14:32",
"receipt_number": "4821-0094",
"items": [
{ "name": "Bio Bananen", "quantity": 1, "unit_price": 1.99, "total": 1.99 },
{ "name": "Hafermilch 1L", "quantity": 2, "unit_price": 1.49, "total": 2.98 },
{ "name": "Vollkornbrot", "quantity": 1, "unit_price": 2.79, "total": 2.79 },
{ "name": "Tomaten 500g", "quantity": 1, "unit_price": 1.89, "total": 1.89 },
{ "name": "Olivenöl 500ml", "quantity": 1, "unit_price": 4.99, "total": 4.99 }
],
"subtotal": 14.64,
"tax_rate": 0.07,
"tax_amount": 1.02,
"total": 14.64,
"payment_method": "EC-Karte",
"currency": "EUR"
}
Every item extracted with the correct price, even from a thermal-printed receipt photographed on a phone.
Example: Restaurant Receipt
{
"store": {
"name": "Trattoria Milano",
"address": "Kantstr. 42, 10625 Berlin"
},
"date": "2025-12-20",
"items": [
{ "name": "Pizza Margherita", "quantity": 2, "unit_price": 11.50, "total": 23.00 },
{ "name": "Insalata Mista", "quantity": 1, "unit_price": 8.90, "total": 8.90 },
{ "name": "Tiramisu", "quantity": 2, "unit_price": 6.50, "total": 13.00 },
{ "name": "Acqua 0.75L", "quantity": 1, "unit_price": 3.50, "total": 3.50 }
],
"subtotal": 48.40,
"tax_rate": 0.19,
"tax_amount": 9.20,
"total": 57.60,
"payment_method": "Visa ending 4821",
"currency": "EUR"
}
Building a Receipt Processing App
Mobile Expense Tracker
from fastapi import FastAPI, UploadFile
import requests
app = FastAPI()
@app.post("/scan-receipt")
async def scan_receipt(file: UploadFile):
"""Upload a receipt photo and get structured data back."""
resp = requests.post(
f"{API_BASE}/pipeline/file",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": (file.filename, await file.read())},
data={"schemaId": RECEIPT_SCHEMA_ID}
)
pipeline_id = resp.json()["id"]
# Wait for result
result = poll_for_result(pipeline_id)
return {
"store": result["store"]["name"],
"date": result["date"],
"total": result["total"],
"items": result["items"],
"category": classify_expense(result),
}
def classify_expense(receipt):
"""Simple category classification based on store name."""
store = receipt.get("store", {}).get("name", "").lower()
if any(w in store for w in ["rewe", "edeka", "aldi", "lidl"]):
return "Groceries"
if any(w in store for w in ["restaurant", "trattoria", "café"]):
return "Dining"
if any(w in store for w in ["shell", "aral", "esso"]):
return "Transport"
return "Other"
Tips for Better Receipt Extraction
Photo Quality
- Flatten the receipt before photographing — smooth out wrinkles and folds
- Use good lighting — even, diffused light without shadows
- Fill the frame — get the receipt to fill most of the photo
- Focus on the text — make sure the smallest text is readable in the photo
Schema Tips
- Include
currency— Essential for international expense tracking - Use
payment_method— Useful for reconciliation with bank statements - Keep
itemsflexible — Not every receipt has quantity and unit_price; some just have item name and total - Add
store.tax_id— Needed for VAT reclaim in many European countries
Use Cases
- Expense management — Employees snap receipts, data flows to accounting
- Bookkeeping automation — Small businesses processing daily sales receipts
- VAT recovery — Extracting tax details for cross-border VAT reclaims
- Personal finance — Tracking spending by category from receipt photos
- Audit trails — Digitizing paper receipts for compliance documentation
Try It Now
Photograph a receipt and upload it in the Playground. Define your schema and see structured data in seconds — even from faded, crumpled thermal paper.
For API integration, see the documentation.
Related articles
How to Extract Data from Scanned Documents
Learn how to extract structured data from scanned PDFs, photographed documents, and image-based files using OCR and schema-based extraction.
imagesHow to Extract Data from Images with an API
Extract structured data from photos, screenshots, and scanned images using OCR and schema-based extraction. Process receipts, business cards, forms, and documents captured on phones.
pdfHow to Extract Tables from PDFs into Structured Data
Extract tables from PDF documents into structured JSON or CSV. Handle multi-column layouts, merged cells, and inconsistent formatting with schema-based extraction.
