> ## Documentation Index
> Fetch the complete documentation index at: https://docs.contextual.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse File

> Parse a file into a structured Markdown and/or JSON. Files must be less than 300MB and 2000 pages. We use LibreOffice to convert DOC(X) and PPT(X) files to PDF, which may affect page count.

See our [blog post](https://contextual.ai/blog/document-parser-for-rag) and [code examples](https://github.com/ContextualAI/examples/blob/main/03-standalone-api/04-parse/parse.ipynb). Email [parse-feedback@contextual.ai](mailto:parse-feedback@contextual.ai) with any feedback or questions.



## OpenAPI

````yaml api-reference/openapi.json post /parse
openapi: 3.1.0
info:
  title: Endpoints
  version: '1.0'
servers:
  - url: https://api.contextual.ai/v1
security:
  - BearerAuth: []
paths:
  /parse:
    post:
      tags:
        - /parse
      summary: Parse File
      description: >-
        Parse a file into a structured Markdown and/or JSON. Files must be less
        than 300MB and 2000 pages. We use LibreOffice to convert DOC(X) and
        PPT(X) files to PDF, which may affect page count.


        See our [blog post](https://contextual.ai/blog/document-parser-for-rag)
        and [code
        examples](https://github.com/ContextualAI/examples/blob/main/03-standalone-api/04-parse/parse.ipynb).
        Email
        [parse-feedback@contextual.ai](mailto:parse-feedback@contextual.ai) with
        any feedback or questions.
      operationId: parse_parse_post
      requestBody:
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/Body_parse_parse_post'
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ParseResponseV1'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
components:
  schemas:
    Body_parse_parse_post:
      properties:
        raw_file:
          type: string
          format: binary
          title: Raw File
          description: >-
            The file to be parsed. The file type must be PDF, DOC / DOCX, PPT /
            PPTX, PNG, JPG / JPEG.
        parse_mode:
          $ref: '#/components/schemas/ParseMode'
          description: >-
            The settings to use for parsing. `basic` is for simple, text-only
            documents. `standard` is for complex documents with images, complex
            hierarchy, and/or no natively encoded textual data (e.g. for scanned
            documents).
          default: standard
          examples:
            - standard
        enable_document_hierarchy:
          type: boolean
          title: Enable Document Hierarchy
          description: >-
            Adds a table of contents to the output with the structure of the
            entire parsed document. This feature is in beta. Controls parsing
            heading levels (e.g. H1, H2, H3) at higher quality. Not permitted in
            `basic` parsing_mode, or if page_range is not continuous and/or does
            not start from page zero.
          examples:
            - true
        enable_split_tables:
          type: boolean
          title: Enable Split Tables
          description: >-
            Controls whether tables are split into multiple tables by row with
            the headers propagated. Use for improving LLM comprehension of very
            large tables. Not permitted in `basic` parsing_mode.
          examples:
            - false
        max_split_table_cells:
          type: integer
          title: Max Split Table Cells
          description: >-
            Threshold number of table cells beyond which large tables are split
            if `enable_split_tables` is True. Must be null if
            `enable_split_tables` is False.
          examples:
            - null
          gt: 0
        page_range:
          type: string
          title: Page Range
          description: >-
            Optional string representing page range to be parsed. Format:
            comma-separated indexes (0-based, e.g. `0,1,2,5,6`), or ranges
            inclusive of both ends (e.g. `0-2,5,6`)
          examples:
            - null
      type: object
      required:
        - raw_file
      title: Body_parse_parse_post
    ParseResponseV1:
      properties:
        job_id:
          type: string
          format: uuid
          title: Job ID
          description: Unique ID of the parse job
      type: object
      required:
        - job_id
      title: ParseResponseV1
      description: /parse response object.
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ParseMode:
      type: string
      enum:
        - basic
        - standard
      title: ParseMode
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API Key

````