Structured data extraction API

Automate document processing.
Get structured JSON from any file.

Extract reliable, schema-based data from PDFs, scans, and documents — without building custom pipelines.

$ curl -X POST api.smole.tech/api/pipeline/file -F "file=@doc.pdf"

Document extraction features

Turn unstructured documents into structured, actionable data

Document conversion

Convert PDFs, Word docs, scanned images, and more to clean Markdown. Works on both digital and scanned documents with built-in OCR.

Schema-driven extraction

You define the JSON schema, we extract the data. Context-aware extraction that understands your documents and returns exactly the structure you need.

Pipeline processing

Convert and extract in a single API call. Upload a PDF, get JSON back. Scales from one document to thousands without breaking a sweat.

API-first design

RESTful API you can integrate in minutes. Simple endpoints, clear responses, and detailed docs so you spend time building, not debugging.

“We built Smole because we got tired of copying data out of PDFs like it was 2003. Documents go in, structured JSON comes out, and you get to keep your sanity. It's honestly just a mole that digs through your paperwork so you don't have to.”
See all features

How document extraction works

Three steps. One API call. Structured data.

1
POST/api/schemas

Register your JSON schema defining the data structure you want to extract. Get back a schema ID. Or just use one of the ones we provide — it makes setup even easier!

2
POST/api/pipeline/file

Upload your document with the schema ID. We convert to Markdown and extract structured data.

3
GET/api/pipeline/:id

Poll for status. When complete, get clean JSON data matching your schema.