Everything you need for
structured data extraction

A complete API for converting documents and extracting structured data. No ML expertise required. Just define a schema and go.

Easy File Upload

Upload documents directly via multipart form. Support for PDF, DOCX, XLSX, and many more formats.

Document Conversion

Convert PDFs, Word docs, spreadsheets, and more to clean Markdown. Preserves tables, links, and formatting.

Schema-Driven Extraction

Define a JSON schema, get structured data back. Context-aware extraction that works across document types.

Pipeline Processing

Convert and extract in a single API call. Upload a PDF, define a schema, get JSON back. No intermediate steps.

OCR Support

Extract text from scanned documents and images. Perfect for invoices, receipts, and legacy documents.

JSON Schema Validation

Define your output structure with JSON Schema. Get predictable, validated data every time.

Usage Dashboard

Track tokens, processing costs, and job history. Full visibility into your API usage in real-time.

Budget Alerts

Set monthly spending limits and get notified at configurable thresholds. Never get surprised by bills.

Secure by Design

API key authentication, encrypted data in transit, and automatic data retention policies. Your documents stay safe.

Document extraction in action

Upload a document, get structured JSON back

Input: Invoice PDF

Example Company Invoice #2024-0831
Date: 2024-08-31
Bill to: Sample Corp.

Item                 Qty    Price
Web development       1    €450.00
Monthly hosting       1    €125.00
Maintenance plan      1     €75.00

Subtotal: €650.00
VAT (21%): €136.50
Total: €786.50

Output: Structured JSON

{
  "vendor": "Example Company",
  "invoice_number": "2024-0831",
  "date": "2024-08-31",
  "customer": "Sample Corp.",
  "line_items": [
    { "description": "Web development", "quantity": 1, "amount": 450.00 },
    { "description": "Monthly hosting", "quantity": 1, "amount": 125.00 },
    { "description": "Maintenance plan", "quantity": 1, "amount": 75.00 }
  ],
  "subtotal": 650.00,
  "vat_rate": 0.21,
  "vat_amount": 136.50,
  "total": 786.50
}

Why teams choose Smole for document extraction

See how we compare to other document extraction approaches

Setup time

SmoleMinutes
TraditionalDays to weeks

New document types

SmoleJust define a schema
TraditionalBuild new parsers

Format support

SmolePDF, DOCX, images, etc.
TraditionalFormat-specific tools

Layout changes

SmoleHandles automatically
TraditionalBreaks existing parsers

Data accuracy

SmoleContext-aware extraction
TraditionalPattern matching only

Maintenance

SmoleZero
TraditionalOngoing updates required

Reliable document processing at scale

Handles scanned + digital PDFs

OCR-powered conversion for scanned documents, native text extraction for digital files.

Scales to large batches

Process hundreds or thousands of documents through the same API. Built for batch workloads.

Schema-driven output

You define the JSON structure. Every extraction follows your schema — consistent, predictable results.

Deterministic output structure

Same schema, same document type — same output structure. No surprises in production.

Ready to automate document processing?

Start extracting structured data from your documents in minutes.