Organizations sit on years of archived PDFs and scanned documents that are effectively unsearchable. Digitizing these into structured data unlocks their value for analytics, compliance, and process automation.
[Scanned Document — Employee Record] PERSONNEL FILE Document Date: March 14, 2019 Classification: CONFIDENTIAL Employee Information: Full Name: Maria Schneider Employee ID: EMP-2019-0342 Department: Engineering Position: Senior Software Engineer Start Date: April 1, 2019 Reports To: Thomas Weber, VP Engineering Contact Details: Email: m.schneider@company.de Phone: +49 30 1234 5678 Address: Berliner Str. 45, 10715 Berlin Compensation: Annual Salary: 85,000 EUR Pay Grade: E5 Review Cycle: Annual (next: April 2020) Certifications: - AWS Solutions Architect (2018) - Certified Scrum Master (2017) - ISO 27001 Lead Auditor (2019) Emergency Contact: Name: Klaus Schneider Relationship: Spouse Phone: +49 30 9876 5432
{
"document_date": "2019-03-14",
"classification": "CONFIDENTIAL",
"employee": {
"full_name": "Maria Schneider",
"employee_id": "EMP-2019-0342",
"department": "Engineering",
"position": "Senior Software Engineer",
"start_date": "2019-04-01",
"reports_to": {
"name": "Thomas Weber",
"title": "VP Engineering"
}
},
"contact": {
"email": "m.schneider@company.de",
"phone": "+49 30 1234 5678",
"address": "Berliner Str. 45, 10715 Berlin"
},
"compensation": {
"annual_salary": 85000,
"currency": "EUR",
"pay_grade": "E5",
"next_review": "2020-04-01"
},
"certifications": [
{ "name": "AWS Solutions Architect", "year": 2018 },
{ "name": "Certified Scrum Master", "year": 2017 },
{ "name": "ISO 27001 Lead Auditor", "year": 2019 }
],
"emergency_contact": {
"name": "Klaus Schneider",
"relationship": "Spouse",
"phone": "+49 30 9876 5432"
}
}Tell Smole what data to extract using a JSON Schema.
{
"type": "object",
"properties": {
"document_date": { "type": "string", "format": "date" },
"classification": { "type": "string" },
"employee": {
"type": "object",
"properties": {
"full_name": { "type": "string" },
"employee_id": { "type": "string" },
"department": { "type": "string" },
"position": { "type": "string" },
"start_date": { "type": "string", "format": "date" },
"reports_to": {
"type": "object",
"properties": {
"name": { "type": "string" },
"title": { "type": "string" }
}
}
}
},
"contact": {
"type": "object",
"properties": {
"email": { "type": "string", "format": "email" },
"phone": { "type": "string" },
"address": { "type": "string" }
}
},
"compensation": {
"type": "object",
"properties": {
"annual_salary": { "type": "number" },
"currency": { "type": "string" },
"pay_grade": { "type": "string" },
"next_review": { "type": "string", "format": "date" }
}
},
"certifications": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"year": { "type": "integer" }
}
}
},
"emergency_contact": {
"type": "object",
"properties": {
"name": { "type": "string" },
"relationship": { "type": "string" },
"phone": { "type": "string" }
}
}
}
}Upload a document and define your schema. See results in seconds.