Parse a file into a structured Markdown and/or JSON. Files must be less than 300MB and 2000 pages. We use LibreOffice to convert DOC(X) and PPT(X) files to PDF, which may affect page count.
See our blog post and code examples. Email [email protected] with any feedback or questions.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
The file to be parsed. The file type must be PDF, DOC / DOCX, PPT / PPTX, PNG, JPG / JPEG.
Deprecated and ignored field which used to control output types that are saved.
The format of the output to be returned by the /parse endpoint.
markdown-document, markdown-per-page, blocks-per-page The settings to use for parsing. basic is for simple, text-only documents. standard is for complex documents with images, complex hierarchy, and/or no natively encoded textual data (e.g. for scanned documents).
basic, standard "standard"
Adds a table of contents to the output with the structure of the entire parsed document. This feature is in beta. Controls parsing heading levels (e.g. H1, H2, H3) at higher quality. Not permitted in basic parsing_mode, or if page_range is not continuous and/or does not start from page zero.
true
Controls whether tables are split into multiple tables by row with the headers propagated. Use for improving LLM comprehension of very large tables. Not permitted in basic parsing_mode.
false
Threshold number of table cells beyond which large tables are split if enable_split_tables is True. Must be null if enable_split_tables is False.
null
Controls how thorough figure captions are. concise is short and minimizes chances of hallucinations. detailed is more thorough and can include commentary; this mode is in beta. Not permitted in basic parsing_mode.
concise, detailed "concise"
Optional string representing page range to be parsed. Format: comma-separated indexes (0-based, e.g. 0,1,2,5,6), or ranges inclusive of both ends (e.g. 0-2,5,6)
null
Whether to include the page image in the output. output_type must include markdown-per-page if this config is set to true.
Controls the string representation of tables in parsed output. Must be markdown or html.
markdown, html Successful Response
/parse response object.
Unique ID of the parse job