Skip to main content
POST
/
parse
Parse File
curl --request POST \
  --url https://api.contextual.ai/v1/parse \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form parse_mode=standard \
  --form enable_document_hierarchy=true \
  --form enable_split_tables=false \
  --form max_split_table_cells=null \
  --form figure_caption_mode=concise \
  --form page_range=null \
  --form raw_file=@example-file
{
  "job_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
raw_file
file
required

The file to be parsed. The file type must be PDF, DOC / DOCX, PPT / PPTX.

parse_mode
enum<string>

The settings to use for parsing. basic is for simple, text-only documents. standard is for complex documents with images, complex hierarchy, and/or no natively encoded textual data (e.g. for scanned documents).

Available options:
basic,
standard
enable_document_hierarchy
boolean

Adds a table of contents to the output with the structure of the entire parsed document. This feature is in beta. Controls parsing heading levels (e.g. H1, H2, H3) at higher quality. Not permitted in basic parsing_mode, or if page_range is not continuous and/or does not start from page zero.

Examples:

true

enable_split_tables
boolean

Controls whether tables are split into multiple tables by row with the headers propagated. Use for improving LLM comprehension of very large tables. Not permitted in basic parsing_mode.

Examples:

false

max_split_table_cells
integer

Threshold number of table cells beyond which large tables are split if enable_split_tables is True. Must be null if enable_split_tables is False.

Examples:

null

figure_caption_mode
enum<string>

Controls how thorough figure captions are. concise is short and minimizes chances of hallucinations. detailed is more thorough and can include commentary; this mode is in beta. Not permitted in basic parsing_mode.

Available options:
concise,
detailed
page_range
string

Optional string representing page range to be parsed. Format: comma-separated indexes (0-based, e.g. 0,1,2,5,6), or ranges inclusive of both ends (e.g. 0-2,5,6)

Examples:

null

Response

Successful Response

/parse response object.

job_id
string<uuid>
required

Unique ID of the parse job

I