Create a new extraction schema.
Creates a JSON Schema that defines the structure of data to extract from documents. The schema must conform to our supported subset of JSON Schema 2020-12 features.
Supported Schema Features:
Basic Types:
string: Text data with optional constraints (minLength, maxLength, pattern, enum)integer: Whole numbers with optional constraints (minimum, maximum, enum)number: Decimal numbers with optional constraints (minimum, maximum, enum)boolean: True/false valuesnull: Null values (often used with anyOf for optional fields)Complex Types:
object: Key-value pairs with defined propertiesarray: Lists of items with defined item schemasanyOf: Union types (e.g., string or null for optional fields)String Formats:
Supported formats: date-time, time, date, duration, email, hostname, ipv4, ipv6, uuid, uri
Schema Structure:
object type$defs for reusable schema components$ref to reference definitionsitems schema definedConstraints:
$ref definitionsExample Schemas:
Simple Company Schema:
{
"type": "object",
"properties": {
"company_name": {
"type": "string",
"description": "The name of the company exactly as it appears in the document"
},
"form_type": {
"type": "string",
"enum": ["10-K", "10-Q", "8-K", "S-1"],
"description": "The type of SEC form"
},
"trading_symbol": {
"type": "string",
"description": "The trading symbol of the company"
},
"zip_code": {
"type": "integer",
"description": "The zip code of the company headquarters"
}
},
"required": ["company_name", "form_type", "trading_symbol", "zip_code"]
}
Complex Resume Schema:
{
"type": "object",
"properties": {
"personalInfo": {
"type": "object",
"properties": {
"fullName": {"type": "string"},
"contact": {
"type": "object",
"properties": {
"emails": {
"type": "array",
"items": {"type": "string", "format": "email"}
},
"phones": {
"type": "array",
"items": {"type": "string"}
}
}
}
},
"required": ["fullName"]
},
"workExperience": {
"type": "array",
"items": {
"type": "object",
"properties": {
"jobTitle": {"type": "string"},
"company": {"type": "string"},
"startDate": {"type": "string"},
"endDate": {"type": ["string", "null"]},
"isCurrent": {"type": "boolean"}
},
"required": ["jobTitle", "company", "startDate"]
}
}
},
"required": ["personalInfo", "workExperience"]
}
Schema with References:
{
"type": "object",
"properties": {
"algorithms": {
"type": "array",
"items": {"$ref": "#/$defs/algorithm"}
}
},
"$defs": {
"algorithm": {
"type": "object",
"properties": {
"name": {"type": "string"},
"description": {"type": "string"}
},
"required": ["name"]
}
}
}
Best Practices:
enum for known values (e.g., form types, status values)anyOf with null or omitting from requiredBearer authentication header of the form Bearer <token>, where <token> is your auth token.
Request model for creating a new extraction schema.
The schema_definition must be a valid JSON Schema that defines the structure of data to extract from documents. The system supports a subset of JSON Schema 2020-12 features optimized for document extraction.
Name of the schema
JSON Schema definition. Must be a valid JSON Schema that defines the structure of data to extract from documents. See the comprehensive schema guide in the API documentation for detailed examples and supported features.
{
"properties": {
"company_name": {
"description": "The name of the company exactly as it appears in the document",
"type": "string"
},
"form_type": {
"description": "The type of SEC form",
"enum": ["10-K", "10-Q", "8-K", "S-1"],
"type": "string"
},
"trading_symbol": {
"description": "The trading symbol of the company",
"type": "string"
},
"zip_code": {
"description": "The zip code of the company headquarters",
"type": "integer"
}
},
"required": [
"company_name",
"form_type",
"trading_symbol",
"zip_code"
],
"type": "object"
}Description of the schema
Additional metadata for the schema
Successful Response
Response model for schema information.
Unique ID of the schema
Name of the schema
JSON schema definition
Schema metadata
Timestamp when the schema was created
Timestamp when the schema was last updated
Description of the schema