Structured data extraction API
Extract reliable, schema-based data from PDFs, scans, and documents — without building custom pipelines.
Turn unstructured documents into structured, actionable data
Convert PDFs, Word docs, scanned images, and more to clean Markdown. Works on both digital and scanned documents with built-in OCR.
You define the JSON schema, we extract the data. Context-aware extraction that understands your documents and returns exactly the structure you need.
Convert and extract in a single API call. Upload a PDF, get JSON back. Scales from one document to thousands without breaking a sweat.
RESTful API you can integrate in minutes. Simple endpoints, clear responses, and detailed docs so you spend time building, not debugging.
“We built Smole because we got tired of copying data out of PDFs like it was 2003. Documents go in, structured JSON comes out, and you get to keep your sanity. It's honestly just a mole that digs through your paperwork so you don't have to.”See all features
Three steps. One API call. Structured data.
/api/schemasRegister your JSON schema defining the data structure you want to extract. Get back a schema ID. Or just use one of the ones we provide — it makes setup even easier!
/api/pipeline/fileUpload your document with the schema ID. We convert to Markdown and extract structured data.
/api/pipeline/:idPoll for status. When complete, get clean JSON data matching your schema.