# Get Metrics
Source: https://docs.contextual.ai/api-reference/agents-query/get-metrics
api-reference/openapi.json get /agents/{agent_id}/metrics
Returns usage and user-provided feedback data. This information can be used for data-driven improvements and optimization.
# Get Retrieval Info
Source: https://docs.contextual.ai/api-reference/agents-query/get-retrieval-info
api-reference/openapi.json get /agents/{agent_id}/query/{message_id}/retrieval/info
Return metadata of the contents used to generate the response for a given message.
# Provide Feedback
Source: https://docs.contextual.ai/api-reference/agents-query/provide-feedback
api-reference/openapi.json post /agents/{agent_id}/feedback
Provide feedback for a generation or a retrieval. Feedback can be used to track overall `Agent` performance through the `Feedback` page in the Contextual UI, and as a basis for model fine-tuning.
If providing feedback on a retrieval, include the `message_id` from the `/query` response, and a `content_id` returned in the query's `retrieval_contents` list.
For feedback on generations, include `message_id` and do not include a `content_id`.
# Query
Source: https://docs.contextual.ai/api-reference/agents-query/query
api-reference/openapi.json post /agents/{agent_id}/query
Start a conversation with an `Agent` and receive its generated response, along with relevant retrieved data and attributions.
# Create Agent
Source: https://docs.contextual.ai/api-reference/agents/create-agent
api-reference/openapi.json post /agents
Create a new `Agent` with a specific configuration.
This creates a specialized RAG `Agent` which queries over one or multiple `Datastores` to retrieve relevant data on which its generations are grounded.
Retrieval and generation parameters are defined in the provided `Agent` configuration.
If no `datastore_id` is provided in the configuration, this API automatically creates an empty `Datastore` and configures the `Agent` to use the newly created `Datastore`.
> Note that self-serve users are currently required to create agents through our UI. Otherwise, they will receive the following message: "This endpoint is disabled as you need to go through checkout. Please use the UI to make this request."
# Delete Agent
Source: https://docs.contextual.ai/api-reference/agents/delete-agent
api-reference/openapi.json delete /agents/{agent_id}
Delete a given `Agent`. This is an irreversible operation.
Note: `Datastores` which are associated with the `Agent` will not be deleted, even if no other `Agent` is using them. To delete a `Datastore`, use the `DELETE /datastores/{datastore_id}` API.
# Edit Agent
Source: https://docs.contextual.ai/api-reference/agents/edit-agent
api-reference/openapi.json put /agents/{agent_id}
Modify a given `Agent` to utilize the provided configuration.
Fields not included in the request body will not be modified.
# Get Agent Metadata
Source: https://docs.contextual.ai/api-reference/agents/get-agent-metadata
api-reference/openapi.json get /agents/{agent_id}/metadata
Get metadata and configuration of a given `Agent`.
# List Agents
Source: https://docs.contextual.ai/api-reference/agents/list-agents
api-reference/openapi.json get /agents
Retrieve a list of all `Agents`.
# Reset Agent
Source: https://docs.contextual.ai/api-reference/agents/reset-agent
api-reference/openapi.json put /agents/{agent_id}/reset
Reset a given `Agent` to default configuration.
# Delete Document
Source: https://docs.contextual.ai/api-reference/datastores-documents/delete-document
api-reference/openapi.json delete /datastores/{datastore_id}/documents/{document_id}
Delete a given document from its `Datastore`. This operation is irreversible.
# Get Document Metadata
Source: https://docs.contextual.ai/api-reference/datastores-documents/get-document-metadata
api-reference/openapi.json get /datastores/{datastore_id}/documents/{document_id}/metadata
Get details of a given document, including its `name` and ingestion job `status`.
# Ingest Document
Source: https://docs.contextual.ai/api-reference/datastores-documents/ingest-document
api-reference/openapi.json post /datastores/{datastore_id}/documents
Ingest a document into a given `Datastore`.
Ingestion is an asynchronous task. Returns a document `id` which can be used to track the status of the ingestion job through calls to the `GET /datastores/{datastore_id}/documents/{document_id}/metadata` API.
This `id` can also be used to delete the document through the `DELETE /datastores/{datastore_id}/documents/{document_id}` API.
`file` must be a PDF, HTML, DOC(X) or PPT(X) file. The filename must end with one of the following extensions: `.pdf`, `.html`, `.htm`, `.mhtml`, `.doc`, `.docx`, `.ppt`, `.pptx`.
# List Documents
Source: https://docs.contextual.ai/api-reference/datastores-documents/list-documents
api-reference/openapi.json get /datastores/{datastore_id}/documents
Get list of documents in a given `Datastore`, including document `id`, `name`, and ingestion job `status`.
Performs `cursor`-based pagination if the number of documents exceeds the requested `limit`. The returned `cursor` can be passed to the next `GET /datastores/{datastore_id}/documents` call to retrieve the next set of documents.
# Update Document Metadata
Source: https://docs.contextual.ai/api-reference/datastores-documents/update-document-metadata
api-reference/openapi.json post /datastores/{datastore_id}/documents/{document_id}/metadata
Post details of a given document that will enrich the chunk and be added to the context or just for filtering. If Just for filtering, start with "_" in the key.
# Create Datastore
Source: https://docs.contextual.ai/api-reference/datastores/create-datastore
api-reference/openapi.json post /datastores
Create a new `Datastore`.
A `Datastore` is a collection of documents. Documents can be ingested into and deleted from a `Datastore`.
A `Datastore` can be linked to one or more `Agents`, and conversely, an `Agent` can be associated with one or more `Datastores` to ground its responses with relevant data. This flexible many-to-many relationship allows `Agents` to draw from multiple sources of information. This linkage of `Datastore` to `Agent` is done through the `Create Agent` or `Edit Agent` APIs.
> Note that self-serve users are currently required to create datastores through our UI. Otherwise, they will receive the following message: "This endpoint is disabled as you need to go through checkout. Please use the UI to make this request."
# Delete Datastore
Source: https://docs.contextual.ai/api-reference/datastores/delete-datastore
api-reference/openapi.json delete /datastores/{datastore_id}
Delete a given `Datastore`, including all the documents ingested into it. This operation is irreversible.
This operation will fail with status code 400 if there is an active `Agent` associated with the `Datastore`.
# Edit Datastore Configuration
Source: https://docs.contextual.ai/api-reference/datastores/edit-datastore-configuration
api-reference/openapi.json put /datastores/{datastore_id}
# Get Datastore Metadata
Source: https://docs.contextual.ai/api-reference/datastores/get-datastore-metadata
api-reference/openapi.json get /datastores/{datastore_id}/metadata
Get the details of a given `Datastore`, including its name, create time, and the list of `Agents` which are currently configured to use the `Datastore`.
# List Datastores
Source: https://docs.contextual.ai/api-reference/datastores/list-datastores
api-reference/openapi.json get /datastores
Retrieve a list of `Datastores`.
Performs `cursor`-based pagination if the number of `Datastores` exceeds the requested `limit`. The returned `cursor` can be passed to the next `GET /datastores` call to retrieve the next set of `Datastores`.
# Reset Datastore
Source: https://docs.contextual.ai/api-reference/datastores/reset-datastore
api-reference/openapi.json put /datastores/{datastore_id}/reset
Reset the give `Datastore`. This operation is irreversible and it deletes all the documents associated with the datastore.
# Generate
Source: https://docs.contextual.ai/api-reference/generate/generate
api-reference/openapi.json post /generate
Generate a response using Contextual's Grounded Language Model (GLM), an LLM engineered specifically to prioritize faithfulness to in-context retrievals over parametric knowledge to reduce hallucinations in Retrieval-Augmented Generation and agentic use cases.
The total request cannot exceed 32,000 tokens.
See our [blog post](https://contextual.ai/blog/introducing-grounded-language-model/) and [code examples](https://colab.research.google.com/github/ContextualAI/examples/blob/main/03-standalone-api/02-generate/generate.ipynb). Email [glm-feedback@contextual.ai](mailto:glm-feedback@contextual.ai) with any feedback or questions.
# LMUnit
Source: https://docs.contextual.ai/api-reference/lmunit/lmunit
api-reference/openapi.json post /lmunit
Given a `query`, `response`, and a `unit_test`, return the response's `score` on the unit test on a 5-point continuous scale. The total input cannot exceed 7000 tokens.
See a code example in [our blog post](https://contextual.ai/news/lmunit/). Email [lmunit-feedback@contextual.ai](mailto:lmunit-feedback@contextual.ai) with any feedback or questions.
>🚀 Obtain an LMUnit API key by completing [this form](https://contextual.ai/request-lmunit-api/)
# Parse File
Source: https://docs.contextual.ai/api-reference/parse/parse-file
api-reference/openapi.json post /parse
Parse a file into a structured Markdown and/or JSON. Files must be less than 100MB and 400 pages. We use LibreOffice to convert DOC(X) and PPT(X) files to PDF, which may affect page count.
See our [blog post](https://contextual.ai/blog/document-parser-for-rag) and [code examples](https://github.com/ContextualAI/examples/blob/main/03-standalone-api/04-parse/parse.ipynb). Email [parse-feedback@contextual.ai](mailto:parse-feedback@contextual.ai) with any feedback or questions.
# Parse List Jobs
Source: https://docs.contextual.ai/api-reference/parse/parse-list-jobs
api-reference/openapi.json get /parse/jobs
Get list of parse jobs, sorted from most recent to oldest.
Returns all jobs from the last 30 days, or since the optional `uploaded_after` timestamp.
# Parse Result
Source: https://docs.contextual.ai/api-reference/parse/parse-result
api-reference/openapi.json get /parse/jobs/{job_id}/results
Get the results of a parse job.
Parse job results are retained for up to 30 days after job creation. Fetching results for a parse job that is older than 30 days will return a 404 error.
# Parse Status
Source: https://docs.contextual.ai/api-reference/parse/parse-status
api-reference/openapi.json get /parse/jobs/{job_id}/status
Get the status of a parse job.
Parse job results are retained for up to 30 days after job creation. Fetching a status for a parse job that is older than 30 days will return a 404 error.
# Rerank
Source: https://docs.contextual.ai/api-reference/rerank/rerank
api-reference/openapi.json post /rerank
Rank a list of documents according to their relevance to a query primarily and your custom instructions secondarily. We evaluated the model on instructions for recency, document type, source, and metadata, and it can generalize to other instructions as well. The reranker supports multilinguality.
The total request cannot exceed 400,000 tokens. The combined length of the query, instruction and any document with its metadata must not exceed 8,000 tokens.
See our [blog post](https://contextual.ai/blog/introducing-instruction-following-reranker/) and [code examples](https://colab.research.google.com/github/ContextualAI/examples/blob/main/03-standalone-api/03-rerank/rerank.ipynb). Email [rerank-feedback@contextual.ai](mailto:rerank-feedback@contextual.ai) with any feedback or questions.
# Get Users
Source: https://docs.contextual.ai/api-reference/users/get-users
api-reference/openapi.json get /users
Retrieve a list of `users`.
# Invite Users
Source: https://docs.contextual.ai/api-reference/users/invite-users
api-reference/openapi.json post /users
Invite users to the tenant. This checks if the user is already in the tenant and if not, creates the user. We will return a list of user emails that were successfully created (including existing users).
# Remove User
Source: https://docs.contextual.ai/api-reference/users/remove-user
api-reference/openapi.json delete /users
Delete a given `user`.
# Update User
Source: https://docs.contextual.ai/api-reference/users/update-user
api-reference/openapi.json put /users
Modify a given `User`.
Fields not included in the request body will not be modified.
# Node.js SDK
Source: https://docs.contextual.ai/sdks/node
# Python SDK
Source: https://docs.contextual.ai/sdks/python
# Advanced Query Features
Source: https://docs.contextual.ai/user-guides/advanced-query-guide
Learn how to leverage multi-turn and structured outputs
# Multi-turn conversations
Based on your use case, you may want to leverage our multi-turn capabilities. With multi-turn, agents can use prior conversation history when responding to new queries. This means agents will automatically use information in prior turns to resolve ambiguities in the query, fetch the appropriate retrievals, and generate the final answer.
|
w/ Multi-turn
|
w/o Multi-turn
|
**Step 1**
|
**Query:** What is the merger price
|
**Query:** What is the merger price
|
|
**Response:**\
Direct Answer: The merger price is \$11.50 per share in cash.
Rationale: This is explicitly stated in Section 2.1(a) of the contract, which defines the "Per Share Merger Consideration" as \$11.50 per share in cash, without interest
Caveats: This price applies to all shares except for: (1) shares owned by Parent, Merger Sub, or their wholly owned subsidiaries (referred to as "Cancelled Shares"), (2) shares held in treasury by the Company, and (3) shares held by dissenting shareholders (referred to as "Dissenting Shares")
|
**Response:**\
Direct Answer: The merger price is \$11.50 per share in cash.
Rationale: This is explicitly stated in Section 2.1(a) of the contract, which defines the "Per Share Merger Consideration" as \$11.50 per share in cash, without interest
Caveats: This price applies to all shares except for: (1) shares owned by Parent, Merger Sub, or their wholly owned subsidiaries (referred to as "Cancelled Shares"), (2) shares held in treasury by the Company, and (3) shares held by dissenting shareholders (referred to as "Dissenting Shares")
|
**Turn 2**
|
**Query:** And when can this change?
|
**Query:** And when can this change?
|
|
**Response:**\
Direct Answer: The merger price of \$11.50 per share in cash can be adjusted only in the event of a reclassification, stock split, combination, stock dividend or distribution, recapitalization, subdivision, merger, issuer tender or exchange offer, or other similar transaction that changes the number of shares outstanding.
Rationale: This is specified in Section 2.5 of the contract, which states that if the number of Shares or securities convertible into Shares changes due to certain corporate actions, "the Per Share Merger Consideration shall be equitably adjusted to provide to Parent and the holders of Shares the same economic effect as contemplated by this Agreement prior to such event.
Caveats: The adjustment is specifically limited to maintaining the same economic effect as originally contemplated, and the contract does not provide for adjustments due to other factors such as market conditions or changes in the company's financial performance.
|
**Response:**\
Direct Answer: The change can occur after a 4-day notice period and only in response to an Intervening Event or a Superior Proposal, subject to certain conditions and requirements.
Rationale: This is based on Section 6.1(e) which outlines the specific conditions under which a Change of Recommendation can occur. It requires a 4-day notice period (the "Notice Period") and must be in response to either an Intervening Event or a Superior Proposal. The Board must also determine that failing to make the change would be inconsistent with their fiduciary duties.
Caveats: The change must follow specific procedural requirements including delivery of a written notice (Company Notice) with material details, and the Board must consult with financial advisors and outside legal counsel.
|
## Enabling multi-turn behavior
Enabling multi-turn is an **agent-level** setting. If multi-turn is enabled, at query-time you can pass a `conversation_id`, which will:
1. Append the new query to the given conversation
2. Generate the response in a conversation-aware manner
If you do not pass a `conversation_id`, Contextual will assume you are beginning a new conversation with the query in the request as the initial message.
If multi-turn is **disabled at the agent-level**, passing the `conversation_id` will only append the query to the given conversation's history; the response will not be generated in a conversation-aware manner.
## Code Snippets
To enable or disable multi-turn for a given agent:
```python Python
from contextual import ContextualAI
client = ContextualAI(api_key=api_key) # input your key
agent_id = "" # input your agent_id
params = {
"agent_configs": {
"global_config": {
"enable_multi_turn": True # Set to False to disable multiturn
}
}
}
client.agents.update(agent_id=agent_id, extra_body=params)
```
```bash Shell
curl 'https://api.app.contextual.ai/v1/agents/{agent_id} \
--request PUT \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $API_KEY' \
--data '{
"agent_configs": {
"global_config": {
"enable_multi_turn": true
}
}
}'
```
To pass the `conversation_id` when querying:
```python Python
from contextual import ContextualAI
client = ContextualAI(
api_key=os.environ.get("CONTEXTUAL_API_KEY"),
)
response = client.agents.query.create(
# replace with your agent's id
agent_id=agent_id,
# replace with your query
messages=[{"content": "", "role": "user"}],
# replace with a valid conversation_id
conversation_id=conversation_id
)
```
# Structured outputs \[Beta]
Structured outputs is currently in beta and is missing some functionality, like attributions
Structured output mode is useful when you require query responses to follow a repeatable and easily parseable format. Example use cases include extracting pre-defined categories of information or generating responses that need to integrate seamlessly with downstream applications or workflows.
Contextual supports structured outputs in JSON format. To use structured ouputs:
1. First, define your schema, which must be valid JSON and include the `json_schema` key.
```python Python
# define schema
schema = {
"json_schema": {
"type": "object",
"properties": {
"regions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"region": {"type": "string"},
"revenue": {"type": "number"},
"share_of_revenue": {"type": "number"}
},
"required": ["region", "revenue", "share_of_revenue"]
}
}
},
"required": ["regions"]
}
}
```
2. Then, pass the schema as part of your `/query` request. If you are using our Python SDK, the schema must be passed under the `structrued_output` key in the `extra_body` param.
```python Python SDK
from contextual import ContextualAI
# initialize your client
client = ContextualAI(
api_key=os.environ.get("CONTEXTUAL_API_KEY"),
)
# define your schema and add it to the payload object under the
# structured_output key
payload = {
"structured_output": schema
}
# pass the payload in the "extra_body" field, along with your other
# query parameters
response = client.agents.query.create(
# replace with your agent's id
agent_id=""
# replace with your query
messages=[{"content": "what was the regional revenue breakdown in 2022", "role": "user"}],
# pass the schema in the `extra_body` param
extra_body=payload
)
```
3. Finally, parse the returned message as a JSON
```python Python
import json
results = json.loads(response.message.content)
```
Example output:
```json
{'regions': [{'region': 'Americas',
'revenue': 28079,
'share_of_revenue': 44.0},
{'region': 'Europe, Middle East & Africa',
'revenue': 25301,
'share_of_revenue': 39.5},
{'region': 'Asia & Pacific', 'revenue': 10486, 'share_of_revenue': 16.4}]}
```
# Contextual AI Agent Parameters
Source: https://docs.contextual.ai/user-guides/agent-params
Learn how to customize the behavior and performance of your agents
This article describes the key parameters that are available when configuring RAG agents on the Contextual AI platform. Sensible defaults are applied for all settings, but they can be modified based on your preferences or to optimize performance against your data and query patterns.
## Standard Parameters
* **Datastores (datastore\_ids):** Datastores are the knowledgebases that your agent can access when answering queries. Files uploaded into a datastore are processed using Contextual's multi-modal document understanding pipeline, which prepares documents in ways optimized for end-to-end RAG performance. You must link at least one datastore to your agent, but you can specify more.
* **Generator Model (llm\_model\_id):** Determines which generator model powers your agent. You can use either our default Grounded Langauge Model or a version that has been [specifically tuned ](https://docs.contextual.ai/user-guides/tune-eval-guide])to your use case. Tuned models can only be used with the agents on which they were tuned.
## System Prompts
**Note on Adherence:** Contextual has built agents to faithfully follow instructions. However, in some cases complete adherence cannot be guaranteed, especially where instructions are unclear, under-specified, or in conflict with other instructions you have given or our guardrails.
The System Prompts instruct the agent on how to respond to users’ queries given the retrieved knowledge. The appropriate prompt is passed, along with the user query and relevant retrievals, to the Generator Model at generation-time.
* **Core System Prompt (system\_prompt):** Defines how the agent interprets queries and generates responses. You can provide instructions about the agent’s persona and style, and the desired content and structure of the responses.
You are a helpful AI assistant created by Contextual AI to answer questions about relevant documentation provided to you. Your responses should be precise, accurate, and sourced exclusively from the provided information. Please follow these guidelines:
\* Only use information from the provided documentation. Avoid opinions, speculation, or assumptions.
\* Use the exact terminology and descriptions found in the provided content.
\* Keep answers concise and relevant to the user's question.
\* Use acronyms and abbreviations exactly as they appear in the documentation or query.
\* Apply markdown if your response includes lists, tables, or code.
\* Directly answer the question, then STOP. Avoid additional explanations unless specifically relevant.
\* If the information is irrelevant, simply respond that you don't have relevant documentation and do not provide additional comments or suggestions. Ignore anything that cannot be used to directly answer this query.
* **No Retrieval System Prompt (no\_retrieval\_system\_prompt):** Defines the agent’s behavior if, after the retrieval, reranking, and filter steps, no relevant knowledge has been identified. You can use this prompt to define boilerplate refusals, offer help and guidance, provide information about the document store, and specify other contextually-appropriate ways that the agent should respond.
You are an AI RAG agent created by Contextual to help answer questions and complete tasks posed by users. Your capabilities include accurately retrieving/reranking information from linked datastores and using these retrievals to generate factual, grounded responses. You are powered by leading document parsing, retrieval, reranking, and grounded generation models. Users can impact the information you have access to by uploading files into your linked datastores. Full documentation, API specs, and guides on how to use Contextual, including agents like yourself, can you found at [docs.contextual.ai](http://docs.contextual.ai).
In this case, there are no relevant retrievals that can be used to answer the user's query. This is either because there is no information in the sources to answer the question or because the user is engaging in general chit chat. Respond according to the following guidelines:
* If the user is engaging in general pleasantries ("hi", "how goes it", etc.), you can respond in kind. But limit this to only a brief greeting or general acknowledgement
* Your response should center on describing your capabilities and encouraging the user to ask a question that is relevant to the sources you have access to. You can also encourage them to upload relevant documents and files to your linked datastore(s).
* DO NOT answer, muse about, or follow-up on any general questions or asks. DO NOT assume that you have knowledge about any particular topic. DO NOT assume access to any particular source of information.
* DO NOT engage in character play. You must maintain a friendly, professional, and neutral tone at all times
## Query Understanding
These settings affect if and how user queries are modified to improve retrieval performance and response generation.Â
* **Enable Multi-turn (enable\_multi\_turn)**: Allows the agent to remember and reference previous parts of the conversation, making interactions feel more natural and continuous. When enabled, the user’s query will automatically be reformulated based on prior turns to resolve ambiguities. The conversation history is prepended to the query at generation-time.
* **Check Retrieval Need (should\_check\_retrieval\_need)**: Enables a check for whether the user query is general chit-chat or an addressable question. If the query is general chit-chat, the intermediate retrieval steps are skipped and the generator is called using the `no_retrieval_system_prompt`. If it is an addressable question, the RAG pipeline as configured is executed.
## Query Reformulation
These settings allow you to modify the original user query prior to retrieval and generation. These strategies can help improve retrieval accuracy or completeness.
* **Enable Query Expansion (enable\_query\_expansion)**: Toggles the query expansion module `on` or `off`. When enabled, the user's original query will be rewritten according to the instructions set out in the prompt, guided by any provided examples. If no prompt or examples are given, the default prompt is used.
* \*\*Instructions: \*\*An optional parameter that specifies how queries should be reformulated.
* **Examples:** An optional parameter that provides few shot examples of how queries should be reformulated based on the provided instructions.
Instructions: Reformulate the query so that it is more detailed and includes relevant terminology or topics that will be helpful in maximizing the quality of the information retrieved to answer the query. \
\
Example 1: \
Original query: What are JPMorgan's results this quarter? \
Expanded query: Can you provide the latest financial results for JPMorgan, including revenue, earnings per share, and key metrics for the most recent quarter? \
\
Example 2: \
Original query: What is data cleaning? \
Expanded query: Could you explain the concept of data cleaning, including common techniques used, typical challenges faced, and its role in the data preprocessing pipeline for machine learning models? \
\
Example 3: \
Original query: What are the results of Apple this quarter? \
Expanded query: Can you provide the latest financial results for Apple, including revenue, earnings per share, and key metrics for the most recent quarter?
* **Enable Query Decomposition (enable\_query\_decomposition):** Toggles the query decomposition module `on` or `off`. When enabled, the module will break down complex and compound queries into a series of simpler queries. Each sub-query will have individual retrievals, which are then intelligently combined prior to any subsequent reranking, filtering, or generation steps. Simple queries, as judged by the model, will not be decomposed.
* **Examples:** An optional parameter that provides few shot examples of how an input query should be decomposed into subqueries.
## Retrieval
These settings determine how the agent performs the initial retrieval from linked unstructured datastores.
* **Number of Retrieved Chunks (top\_k\_retrieved\_chunks):** The maximum number of chunks that should be retrieved from the linked datastore(s). For example, if set to 10, the agent will retrieve at most 10 relevant chunks to generate its response. The value ranges from 1 to 200, and the default value is 100.
* **Lexical Search Weight (lexical\_alpha):** When chunks are scored during retrieval (based on their relevance to the user query), this controls how much weight the scoring gives to exact keyword matches. A higher value means the agent will prioritize finding exact word matches, while a lower value allows for more flexible, meaning-based matching. The value ranges from 0 to 1, and we default to 0.1. You can increase this weight if exact terminology, specific entities, or specialized vocabulary is important for your use case.
* **Semantic Search Weight (semantic\_alpha):** When chunks are scored during retrieval (based on their relevance to the user query), this controls how much weight the scoring gives to semantic similarity. Semantic searching looks for text that conveys similar meaning and concepts, rather than just matching exact keywords or phrases. The value ranges from 0 to 1, and we default to 0.9. The value of the semantic search weight and lexical search weight must sum to 1.
## Rerank and Filter
These settings affect how the agent reranks and filters chunks before passing them to the generator model.
* **Enable Reranking (enable\_rerank):** Allows the agent to take the initially retrieved document chunks and rerank them based on the provided instructions and user query. The top reranked chunks are passed on for filtering and final response generation.
* **Rerank instructions (rerank\_instructions):** Natural language instructions that describe your preferences for the ranking of chunks, such as prioritizing chunks from particular sources or time periods. Chunks will be rescored based on these instructions and the user query. If no instruction is given, the rerank scores are based only on relevance of chunks to the query.
* **Number of Reranked Chunks (top\_k\_reranked\_chunks):** The number of top reranked chunks that are passed on for generation.
* **Reranker score filter (reranker\_score\_filter\_threshold)**: If the value is set to greater than 0, chunks with relevance scores below the set value will not be passed on. Input must be between 0 and 1.
* **Enable Filtering (enable\_filter):** Allows the agent to perform a final filtering step prior to generation. When enabled, chunks are checked against the filter prompt and irrelevant chunks are filtered out. This acts like a final quality control checkpoint, helping to ensure that only relevant chunks are passed to the generator. This filter can improve response accuracy and relevance, but also increase the false refusal rate if the configuration is too strict.
* **Filter Prompt (filter\_prompt):** Natural language instructions that describes the criteria for relevant and irrelevant chunks. It can be used in tandem with, or as an alternative to, the reranker score-based filtering.
## Generate
These settings affect how the generator model produces responses. Â
* **Max New Tokens (max\_new\_tokens):** Controls the maximum length of the agent’s response. Defaults to 2,048 tokens.
* **Temperature (temperature):** Controls how creative the agent's responses are. A higher temperature means more creative and varied responses, while a lower temperature results in more consistent, predictable answers. It ranges from 0 to 1. Defaults to 0.
* **Top P (top\_p):** Similar to temperature, this parameter also controls response variety. It determines how many different word choices the agent considers when generating its response. Defaults to 0.9.
* **Frequency Penalty (frequency\_penalty):** Helps prevent repetition in responses by making the agent less likely to use words it has already used frequently. This helps ensure more natural, varied language. Defaults to 0.
* **Random Seed (seed):** Controls randomness in how the agent selects the next tokens during text generation. Allows for reproducible results, which can be useful for testing.
* **Enable Groundedness Scores (calculate\_groundedness)**. Enables the agent to provide groundedness scores as part of its response. When enabled, the agent identifies distinct claims in the response and assesses whether each one is grounded in the retrieved document chunks. Claims that are not grounded are shown in yellow in the UI. Defaults off.
* **Disable Commentary (avoid\_commentary):** Flag that indicates whether the Agent should only output strictly factual information grounded in the retrieved knowledge, instead of the complete response (which can include commentary, analysis, etc.).
## Miscellaneous
* **Suggested queries (suggested\_queries):** Example queries that appear in the user interface when first interacting with the agent. You can provide both simple and complex examples to help users understand the full range of questions your agent can handle. This helps set user expectations and guides them toward effective interactions with the agent.
# Beginner's Guide
Source: https://docs.contextual.ai/user-guides/beginner-guide
Create a specialized RAG agent less than 5 Minutes
[Contextual](https://contextual.ai/) provides a platform for creating enterprise-grade AI agents, grounded in your documents and data. See a demo of our agents [here](https://contextual.ai/platform/). Our integrated system allows you to easily:
* Parse and ingest documents into a fully managed vector store(s)
* Create specialized agents that can answer questions and complete tasks by fetching relevant knowledge from your datastores
With Contextual, you get:
* **Best-in-class parsing capabilites and powerful chunking options** that make wrangling unstructured data for RAG a breeze
* **Seamless orchestration** of powerful RAG components and primitives, allowing you to build RAG agents that have exceptional accuracy and scalability
* **State-of-the-art** capabilities and models, like our:
* [Instruction following reranker](https://contextual.ai/blog/introducing-instruction-following-reranker/) that tops benchmarks like BIER
* [Gounded Language Models](https://www.kaggle.com/benchmarks/google/facts-grounding/leaderboard) that are the best in the world at being grounded and factual
* [LMUnit model](https://contextual.ai/lmunit/) that is extremely effective at fine-and-course-grained evalutions of LLM output, using natural language unit tests
You can leverage our platform via our user-friendly UI or through our APIs and SDKs.
Follow this guide to create your first agent! Or, see [https://github.com/ContextualAI/examples](https://github.com/ContextualAI/examples) for easy-to-follow Jupyter notebooks for using our APIs.
***
## Get your API Key
**Note:** If you do not have access to the platform, you can create a workspace via the [**Start Free**](https://app.contextual.ai/?signup=1) button in the upper right of the page. New workspaces are given \$25 in free credits to trial the platform.
Contextual uses API keys to authenticate requests. Only **admins** within a workspace can create API keys. To create a key:

1. Log into your tenant at [app.contextual.ai](http://app.contextual.ai)
2. Click on **API Keys** in the sidebar
3. Click the **Create API Key** button in the upper right and follow the instructions
4. Save the generated key in a secure location
***
## Create and query your first agent
### Step 1: Create a datastore
Datastores contain the files that your agent(s) can access. Each agent must be associated with at least one datastore. You can create a datastore using the `/datastores` endpoint with the following command:
```python Python
from contextual import ContextualAI
# Initialize the client with your API key
contextual = ContextualAI(api_key="API_KEY")
# Create a datastore
datastore = contextual.datastores.create(name="Test Datastore")
```
```shell Shell
curl --request POST \
--url https://api.contextual.ai/v1/datastores \
--header 'accept: application/json' \
--header 'authorization: Bearer $API_KEY' \
--header 'content-type: application/json' \
--data '{"name":"Test Datastore"}'
```
Remember to replace `$API_KEY` with your key. You can rename the datastore if
you want.
If the request is successful, the `id` of the newly created datastore will be returned to you. Be sure to save this `id` as you will need it in subsequent steps!
### Step 2: Add documents into your datastore
Now that you've created a datastore, you can add documents to it. All documents are stored securely in the Contextual platform, and are parsed in ways optimized for use in RAG pipelines and Agents.
* If you don't have your own documents handy, feel free to use our Beginner's Guide test documents, [found here](https://drive.google.com/drive/folders/1e2qfAp6rrpcOGyDdBHwHP-zKzC6K7nyg?usp=drive_link)
* For the best results, use renderable PDFs, i.e., documents that have text that can be copied and pasted.
You can upload a single document using the following command:
```python Python
from contextual import ContextualAI
# Initialize the client with your API key
contextual = ContextualAI(api_key="API_KEY")
# Upload a document
with open('file.pdf', 'rb') as f:
ingestion_result = contextual.datastores.documents.ingest(datastore_id, file=f)
document_id = ingestion_result.id
print(f"Successfully uploaded document_id: {document_id} to datastore_id: {datastore_id}")
```
```shell Shell
curl --request POST \
--url https://api.contextual.ai/v1/datastores/{datastore_id}/documents \
--header 'accept: application/json' \
--header 'authorization: Bearer $API_KEY' \
--header 'content-type: multipart/form-data' \
--form file=@'${file_path}'
```
**Remember to:**
* Replace `{datastore_id}` in the url path with the datastore id from the previous step
* Replace `$API_KEY` with your API key
* Replace `{file_path}` with the path to the document on your machine
If the request is successful, the `id` of the uploaded document will be returned to you. The time required to upload documents depends partly on their length and features. Some documents may require a few minutes to fully process after upload.
To check the status of documents uploaded into the datastore, use this command:
```python Python
from contextual import ContextualAI
# Initialize the client with your API key
contextual = ContextualAI(api_key="API_KEY")
# Get the status of documents in the datastore
metadata = contextual.datastores.documents.metadata(datastore_id=datastore_id, document_id=document_id)
print("Document metadata:", metadata)
```
```shell Shell
curl --request GET \
--url https://api.contextual.ai/v1/datastores/{datastore_id}/documents \
--header 'accept: application/json' \
--header 'authorization: Bearer $API_KEY'
```
**Remember to:**
* Replace `{datastore_id}` in the url path with the `id` from the previous step
* Replace `$API_KEY` with your API key
You should see the document you uploaded in the list, along with its `ingestion_job_status`.
### Step 3: Create an agent
Now that you have a datastore with some files, you can use the `/agents` endpoint to create your first agent.
```python Python
from contextual import ContextualAI
# Initialize the client with your API key
contextual = ContextualAI(api_key="API_KEY")
# Create an agent
agent = contextual.agents.create(name="Test Agent", description="Test Agent", datastore_ids=[datastore_id])
```
```shell Shell
curl --request POST \
--url https://api.contextual.ai/v1/agents \
--header 'accept: application/json' \
--header 'authorization: Bearer $API_KEY' \
--header 'content-type: application/json' \
--data '
{
"name": "Test",
"description": "Test Agent",
"datastore_ids": []
}
'
```
**Remember to:**
* Replace `$API_KEY` with your API key
* Populate the `datastore_ids` list field with the datastore `id` from above
If the request is successful, the `agent_id` of the newly created agent will be returned to you. You'll need this to query your agent in the next step.
### Step 4: Query your agent
Now that you've set up an agent and uploaded documents for use with it to use, you can use the `/query` endpoint to send messages:
```python Python
from contextual import ContextualAI
# Initialize the client with your API key
contextual = ContextualAI(api_key="API_KEY")
# Query the agent
response = contextual.agents.query.create(
agent_id=agent_id,
messages=[
{
"role": "user",
"content": "What is the revenue of Apple?"
}]
)
```
```shell Shell
curl --request POST \
--url https://api.contextual.ai/v1/agents/{agent_id}/query \
--header 'accept: application/json' \
--header 'authorization: Bearer $API_KEY' \
--header 'content-type: application/json' \
--data '
{
"stream": false,
"messages": [
{
"role": "user",
"content": "What is the revenue of Apple?"
}
]
}
'
```
**Remember to:**
* Replace `{agent_id}` in the url path with the agent\_id from the previous step
* Replace `$API_KEY` with your API key
* Replace the `content` field with a question that is relevant to the document(s) you uploaded
If the request is successful, you'll receive a response back that will contain:
* The body of the response
* The sources retrieved from the datastore that are relevant to the response
* Attributions/citations of sources to specific text spans in the response
**Note:** You can only query your agent once at least one document in the datastore has been processed. You can check the status of uploaded documents by following the instructions in the previous step.
🙌 Congratulations! You've now created a basic agent in the Contextual platform.
***
## Try out additional functionality
Now that you have a basic working agent, explore our advanced features:
# Pricing and Billing
Source: https://docs.contextual.ai/user-guides/billing
Pricing and billing guide for on-demand, usage-based customers
**Note:** This guide is for self-serve customers using Contextual AI in On-Demand mode. Customers on Provisioned Throughput should contact their account managers for any billing-related questions.
## Usage Modes
Contextual AI supports two usage modes:
| **Mode** | **Description** | **Good For** |
| :---------------------------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **On-Demand**
*Get started immediately by creating a Contextual Workspace. New workspaces receive \$25 in free credits.* | Pay-as-you-go with no commitments or minimums
- Purchase credits upfront, which are then drawn-down based on your usage, according to the pricing below
- Easily gradaute to Provisioned Throughput as your needs evolve | Customers who:
- Are experimenting with, or evaluating, the platform
- Need the flexiblity to scale usage and costs up or down quickly
- Don't require SLAs or QPS guarantees |
| **Provisioned Throughput**
*Contact sales ([sales@contextual.ai](mailto:sales@contextual.ai)) or your account team.* | Purchase model units (MUs) to reserve guaranteed capacity and predictable throughput | Customers who:
- Need guaranteed and predictable QPS levels to satisfy expected production volumes
- Require service level agreements (SLAs) to meet production standards |
### Services
Regardless of usage mode, customers can also purchase expert support services for fixed terms. With a service package, you get dedicated support from our Sales Engineering and Customer Machine Learning Engineering team to optimize performance and design full solutions. Contact sales or your account manager for more information.
## On-Demand Pricing
In on-demand mode, usage of Contextual -- whether through the UI or APIs -- is charged as follows:
**Note:** Pricing is subject to change. This pricing sheet reflects the latest information as of July 2025.
* `/query` is the endpoint called when you interact with an **Agent**. It invokes a multi-step RAG pipeline, and the cost of a given query is the sum of the costs for each step in the pipeline. These steps can be configured via **Agent Settings**. The average cost per query is \~\$0.05.
| **Step** | **Price** |
| :--------------------------------------------------------------------- | :--------------------------------------------------------------- |
| ***Query Optimization***
(reformulation and decomposition) | \$1/1M toks |
| ***Encode***
(encode the user query for search) | \$0.03/1M toks |
| ***Rerank***
(rerank the retrieved chunks) | \$0.12/1M toks |
| ***Filter***
(filter chunks prior to generation) | \$1/1M toks |
| ***Generate***
(final response generation) | *Input:*
\$3/1M toks
*Output:*
\$15/1M toks |
| ***Groundedness & Safety***
(post-generation groundedness checks) | \$1/1M toks |
* \
Relevant API documentation: [Agent Creation](https://docs.contextual.ai/api-reference/agents/create-agent) and [Query](https://docs.contextual.ai/api-reference/agents-query/query)
* Component APIs allow you to leverage key capabilities and models in a modular way.
*
| **Component** | **Price** |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------- |
| ***Parse***
Parse unstructured documents into markdown and/or JSON.
([Parse API Docs](https://docs.contextual.ai/api-reference/parse/parse-file)) | *Basic:*
\$3/1K pages
*Standard:*
\$40/1K pages |
| ***Rerank***
Rerank based on natural language instructions
([Rerank API Docs](https://docs.contextual.ai/api-reference/rerank/rerank)) | \$0.12/1M toks |
| ***Generate***
Generate using the most grounded language model in the world
([Generate API Docs](https://docs.contextual.ai/api-reference/generate/generate)) | *Input:*
\$3/1M toks
*Output:*
\$15/1M toks |
| ***LMUnit***
Evaluate LLM output using natural language unit tests
([LMUnit API Docs](https://docs.contextual.ai/api-reference/lmunit/lmunit)) | \$3/1M toks |
* Documents added to **Datastores** are sent through our ingestion pipeline where they are parsed, chunked, and indexed in a vector store.
*
| **Step** | **Price** |
| :-------------- | :--------------- |
| ***Ingestion*** | \$48.50/1K pages |
*
*
* \
Relevant API documentation: [Ingest](https://docs.contextual.ai/api-reference/datastores-documents/ingest-document)
## Usage Stats and Credit Top Up
### Viewing Usage and Spend
To see usage stats, click on **Usage** in the sidebar.

The usage page shows a month-by-month breakdown of your usage and spend across the various charged components and endpoints.

### Credit Top-Up
To continue using Contextual once your credits are depleted, you must top-up. To top-up credits:
1. Navigate to the **Billing** page in your workspace

2. Add a valid credit card under **Payment Methods,** if you havent already

3. Click **Add Credits** and fill out the corresponding form

4. Click **Next** and confirm the payment

### Adding and Managing Payment Methods
In order to top-up credits, you must have a valid payment method linked to your Contextual workspace. To link a payment method:
1. Navigate to the **Billing** page in your workspace

2. Click on **Payment Methods** in the **Learn More** section

3. Click the **+New** button in the upper right of the page

4. Fill out the card or bank information in the form, then click **Next**

5. Fill out your billing address, then click **Save Payment Method**

To remove an existing stored payment method:
1. Locate it in the list of payment methods
2. Click the menu icon
3. Click **Remove Payment Method**

# Key Concepts
Source: https://docs.contextual.ai/user-guides/key-concepts
Welcome to the home of your new documentation
## RAG
Retrieval Augmented Generation or `RAG` is a technique that improves language model generation by incorporating external knowledge. Contextual Agents use `RAG` to ground its responses in directly relevant information, ensuring accuracy for knowledge-intensive tasks. We've pioneered the `RAG 2.0` approach, which outperforms traditional `RAG` systems by optimizing the system end-to-end. [Read more in our blog post](https://contextual.ai/introducing-rag2/).
## Agent
Contextual RAG Agents are optimized end-to-end to deliver exceptional accuracy on complex and knowledge-intensive tasks. `Agents` make intelligent decisions on how to accomplish the tasks, and can take multiple steps to do so. The agentic approach enables a wide range of actions, such as providing standard retrieval-based answers, declining to respond when no relevant information is available, or generating and executing SQL queries when working with structured data. The adaptability and further tuning of `Agents` greatly increases its value for knowledge-intensive tasks.
## Query / Prompt
The question that you submit to an `Agent` . You can submit a `Query` to your `Agent` [via our API](/api-reference/agents-query/query).
## Response
`Response` is the output generated by an `Agent` in response to a `Query`. `Responses` come with the relevant retrieved content (`Knowledge`) and in-line citations (`Attributions`).
## Knowledge
The data retrieved by the `Agent` from the `Datastore` to generate its response. When working with unstructured data, `Knowledge` comes in the form of a list of `Document` chunks that are relevant to the `Query`.
## Case
A `Case` is a row of data. It is either a `Prompt` and `Reference` (gold-standard answer) pair, or a `Prompt`, `Response`, and `Knowledge` triplet. `Evaluation` datasets follow the former schema, while `Tuning` datasets require the latter.
## Attribution
`Attributions` are in-line citations that credit the specific sources of information used by the model to generate a response. When querying Contextual Agents, `Attributions` are included for each claim made in the response. These attributions can be accessed via the query API response or viewed in the UI by hovering over in-line tooltips next to each claim (e.g., \[1], \[2]).
## System Prompt
Instructions that guide an Agent's response generation, helping define its behavior and capabilities. You can set and modify the `System Prompt` when creating or editing an Agent [via our APIs](/api-reference/agents/edit-agent).
## Document
A `Document` is a unit of unstructured data ingested into a `Datastore`, which can be queried and used as the basis for generating responses. Today, we support both `pdf` and `html` files, and plan to expand support to other data types. You can ingest `Documents` into a `Datastore` via our API. After ingestion, `Documents` are automatically parsed, chunked, and processed by our platform.
## Datastore
A repository of data associated with an `Agent`. An `Agent` retrieves relevant data from its associated `Datastores` to generate responses. An `Agent` can connect to multiple `Datastores`, and each `Datastore` can serve multiple `Agents`. You can associate a `Datastore` with an `Agent` when creating or editing an Agent [via our APIs](/api-reference/agents/edit-agent). We also provide [a set of APIs](/api-reference/datastores) for creating and managing `Datastores`.
## Dataset
The `Dataset` object can be used to store labelled data `cases`. A `case` is either a (i) `Prompt`-`Reference` pair, or a (ii) `Prompt`-`Reference`-`Knowledge` triplet. `Datasets` can be used for `Evaluation` or `Tuning`. You can create a new `Dataset` by uploading a CSV or JSONL file via our API.
The `Dataset` object can also store evaluation results. Once an evaluation job is completed, it returns a `Dataset` containing the original `Cases` from the evaluation, now appended with results such as `Equivalence` and `Groundedness` scores for each `Case`.
## LMUnit
An evaluation method using natural language unit tests to assess specific criteria in an Agent's responses. You can define and evaluate clear, testable statements or questions that capture desirable fine-grained qualities of the Agent's response — such as “Is the response succinct without omitting essential information?” or “Is the complexity of the response appropriate for the intended audience?” You can create and run these unit tests [via our API](/api-reference/lmunit/lmunit). Read more about `LMUnit` in [our blog post](https://contextual.ai/blog/lmunit/).
## Workspace
An organizational unit that owns and manages `Agents`, `Datastores`, and other resources within the system. Contextual AI uses `Workspaces` to organize and manage resources, with API keys associated with specific workspaces.
# LLMs.txt
Source: https://docs.contextual.ai/user-guides/llms
Contextual AI documentation in LLM-friendly format
Contextual AI documentation is provided in the `llms.txt` format — a standardized, LLM-friendly structure that makes it easy for large language models to parse and reference.
## Available Formats
We provide two versions of our documentation:
**[llms.txt](https://docs.contextual.ai/llms.txt)**: A concise overview containing brief descriptions of key features and direct links to detailed documentation sections.
**[llms-full.txt](https://docs.contextual.ai/llms-full.txt)**: A comprehensive version that includes the same structure as llms.txt but with expanded details for each section.
# MCP Server
Source: https://docs.contextual.ai/user-guides/mcp-server
Integrate Contextual AI with MCP-compatible clients like Cursor IDE and Claude Desktop
A Model Context Protocol (MCP) server that provides RAG (Retrieval-Augmented Generation) capabilities using Contextual AI. This server integrates with a variety of MCP clients. In this documentation, we will show integration with the both Cursor IDE and Claude Desktop.
## Overview
This MCP server acts as a bridge between AI interfaces (Cursor IDE or Claude Desktop) and a specialized Contextual AI agent. It enables:
1. **Query Processing**: Direct your domain specific questions to a dedicated Contextual AI agent
2. **Intelligent Retrieval**: Searches through comprehensive information in your knowledge base
3. **Context-Aware Responses**: Generates answers that are:
* Grounded in source documentation
* Include citations and attributions
* Maintain conversation context
## Local MCP server
### Prerequisites
* Python 3.10 or higher
* Cursor IDE and/or Claude Desktop
* Contextual AI API key
* MCP-compatible environment
### Installation
1. Clone the repository:
```bash
git clone https://github.com/ContextualAI/contextual-mcp-server.git
cd contextual-mcp-server
```
2. Create and activate a virtual environment:
```bash
python -m venv .venv
source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
```
3. Install dependencies:
```bash
pip install -e .
```
### Configuration
#### Configure MCP Server
The server requires modifications of settings or use.
For example, the single use server should be customized with an appropriate docstring for your RAG Agent.
The docstring for your query tool is critical as it helps the MCP client understand when to route questions to your RAG agent. Make it specific to your knowledge domain. Here is an example:
```
A research tool focused on financial data on the largest US firms
```
or
```
A research tool focused on technical documents for Omaha semiconductors
```
The server also requires the following settings from your RAG Agent:
* `API_KEY`: Your Contextual AI API key
* `AGENT_ID`: Your Contextual AI agent ID
If you'd like to store these files in `.env` file you can specify them like so:
```bash
cat > .env << EOF
API_KEY=key...
AGENT_ID=...
EOF
```
#### AI Interface Integration
This MCP server can be integrated with a variety of clients. To use with either Cursor IDE or Claude Desktop create or modify the MCP configuration file in the appropriate location:
1. First, find the path to your `uv` installation:
```bash
UV_PATH=$(which uv)
echo $UV_PATH
# Example output: /Users/username/miniconda3/bin/uv
```
2. Create the configuration file using the full path from step 1:
```bash
cat > mcp.json << EOF
{
"mcpServers": {
"ContextualAI-TechDocs": {
"command": "$UV_PATH", # make sure this is set properly
"args": [
"--directory",
"\${workspaceFolder}", # Will be replaced with your project path
"run",
"multi-agent/server.py"
]
}
}
}
EOF
```
3. Move to the correct folder location, see below for options:
```bash
mkdir -p .cursor/
mv mcp.json .cursor/
```
Configuration locations:
* For Cursor:
* Project-specific: `.cursor/mcp.json` in your project directory
* Global: `~/.cursor/mcp.json` for system-wide access
* For Claude Desktop:
* Use the same configuration file format in the appropriate Claude Desktop configuration directory
#### Environment Setup
This project uses `uv` for dependency management, which provides faster and more reliable Python package installation.
### Usage
The server provides Contextual AI RAG capabilities using the python SDK, which can available a variety of commands accessible from MCP clients, such as Cursor IDE and Claude Desktop.\
The current server focuses on using the query command from the Contextual AI python SDK, however you could extend this to support other features such as listing all the agents, updating retrieval settings, updating prompts, extracting retrievals, or downloading metrics.
For example, in Cursor, you might ask:
```
Show me the code for initiating the RF345 microchip?
```
The MCP client will
1. Determine if this should be routed to the MCP Server
Then the MCP server will
1. Route the query to the Contextual AI agent
2. Retrieve relevant documentation
3. Generate a response with specific citations
4. Return the formatted answer to Cursor
## Remote MCP server
Cursor supports remote MCP servers via Server-Sent Events (SSE). This allows you to connect directly to hosted MCP services without local installation.
### Configuration
1. Create the MCP configuration file in one of these locations:
* Project-specific: `.cursor/mcp.json` in your project directory
* Global: `~/.cursor/mcp.json` for system-wide access
2. Add the configuration:
```
{
"mcpServers": {
"ContextualAI": {
"type": "sse",
"url": "https://mcp.app.contextual.ai/mcp/",
"headers": {
"Accept": "text/event-stream"
}
}
}
}
```
3. Restart Cursor IDE
### Usage
#### Cursor IDE chat
In Cursor's chat interface, you might ask:
```
Use RAG platform with api_key key-YOUR_API_KEY and agent_id YOUR_AGENT_ID, show me the code for initiating the RF345 microchip.
```
The MCP client will route the query to the Contextual AI agent and generate a response to Cursor.
#### Test script to check for connectivity
You may also run a short test script to verify the connection in Cursor (requires your Contextual AI API key and Agent ID):
```
#!/usr/bin/env python3
"""
Quick test script for ContextualAI HTTP MCP Server
Replace the following placeholders with your actual values:
- YOUR_API_KEY: Your Contextual AI API key
- YOUR_AGENT_ID: Your Contextual AI agent ID
- YOUR_QUERY: Your test question for the agent
"""
import asyncio
from fastmcp import Client
# Server configuration
SERVER_URL = "https://mcp.app.contextual.ai/mcp/"
API_KEY = "key-YOUR_API_KEY"
def extract_result(result):
"""Extract text from FastMCP result format"""
if isinstance(result, list) and len(result) > 0:
return result[0].text
return str(result)
async def quick_test():
"""Quick test of essential functionality"""
print("=== Quick ContextualAI HTTP MCP Test ===\n")
try:
# Use Streamable HTTP transport (the working one!)
from fastmcp.client.transports import StreamableHttpTransport
transport = StreamableHttpTransport(
url=SERVER_URL,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
)
async with Client(transport) as client:
# Test 1: Server connection
print("🔌 Testing server connection...")
await client.ping()
print("âś… Server is reachable")
# Test 2: List tools
tools = await client.list_tools()
print(f"âś… Available tools: {[tool.name for tool in tools]}")
# Test 3: Vegan diet query
print("\n🔍 Testing query:")
print("Query: 'YOUR_QUERY'")
result = await client.call_tool("query", {
"prompt": "YOUR_QUERY",
"agent_id": "YOUR_AGENT_ID",
"api_key": API_KEY
})
response = extract_result(result)
print(f"Response: {response}")
print("\n" + "="*80 + "\n")
print("🎉 All tests completed successfully!")
except Exception as e:
print(f"❌ Test failed: {e}")
print("Please check:")
print("- Server URL is correct")
print("- API key is valid")
print("- Network connectivity")
print("- FastMCP is installed: pip install fastmcp")
if __name__ == "__main__":
asyncio.run(quick_test())
```
To run the test:
1. Install FastMCP: `pip install fastmcp`
2. Replace placeholders with your actual values
3. Run: `python test_script.py`
# Snowflake Native Application
Source: https://docs.contextual.ai/user-guides/snowflake
This guide explains how to install and use the Contextual AI Snowflake Native App
Contextual AI provides a platform for creating enterprise-grade AI agents, grounded in your documents and data. In a few simple steps, you can install Contextual AI as a Native Snowflake Application and create powerful AI agents in the security of your Snowflake environment.
***
## Installation
Obtain Contextual AI from the [Snowflake
Marketplace](https://app.snowflake.com/marketplace).
Grant "create compute pool" and "bind service endpoint" privilege to the
application on Snowsight
Please note, this step may require up to 1 hour. Once the application has
been installed, the application's Snowsight / settings / security view will
look like the image below, your application is now ready for use!

***
## Using the application
Once you have installed the application, click "Launch App" to enter the
Contextual AI Application.
>
}
>
The first user to load into the application will be designated as the "Admin
User" so it is highly recommended that your team's administrator be the
first user to load the application.
Once you have logged in with your Snowflake credentials, you will be
able to see the Contextual AI Dashboard and create your first AI Agent.
>
}
>
The first time you use the application, you will need to create a new agent
and create a new datastore for it.

Click "Create" at the bottom of the Create Agents page to create and
save your first AI Agent.
>
}
/>
Adding Knowledge to Your Agent's Datastore
>
}
>
1. Navigate to the Datastores tab and locate your agent's newly created Datastore.
2. Open the "Documents" page within the Datastore.
3. Select the "Ingest" tab to upload your PDF documents.
Once you upload a document, the system will automatically begin processing it in the background. This process extracts information from your documents and prepares them for future retrieval by your agents.
After your documents are processed, return to the Agents page and click "Chat" to start interacting with your agent. Your agent can now access and reference the knowledge from your uploaded documents during conversations.
>
}
/>
***
## API (programmatic) access
Contextual AI's application provides a REST API for programmatic interaction with your agents and datastore. After you have created and configured an agent (and the agent's datastore) through the UI, you can integrate it into your applications and workflows using API endpoints. Below are the steps to get started with API access:
API Access in the Contextual AI Native App requires obtaining the API endpoint of your instance, which can be found by running this Snowflake query in a Snowflake worksheet or via the Snowflake CLI:
```
CALL CONTEXTUAL_NATIVE_APP.CORE.GET_API_ENDPOINT()
```
You will receive a response formatted as a URL: `xxxxx-xxxxx-xxxxx.snowflakecomputing.app`. This URL value is the backend API endpoint for your application.
To create the full API endpoint, you will need to prepend `https://` to the backend API endpoint from your application and then append `/v1` at the end.
An example of how to do this in Python:
```python
SF_BASE_URL = 'xxxxx-xxxxx-xxxxx.snowflakecomputing.app' # what you will receive from GET_API_ENDPOINT()
BASE_URL = f'https://{SF_BASE_URL}/v1' # using python3 f-string formatting
```
For authentication, instead of using an API key, the Snowflake Native App version of Contextual AI uses a Snowflake token, which can be retrieved using the following Python code:
```python
ctx = snowflake.connector.connect(
user="",# snowflake account user
password='', # snowflake account password
account="organization-account", # format: - (e.g., myorg-account123)
session_parameters={
'PYTHON_CONNECTOR_QUERY_RESULT_FORMAT': 'json'
})
# Obtain a session token.
token_data = ctx._rest._token_request('ISSUE')
token_extract = token_data['data']['sessionToken']
# Create a request to the ingress endpoint with authz.
api_key = f'\"{token_extract}\"'
```
Once you have your API key, you can combine the steps above to create a Contextual AI client in Python that is configured to use your Contextual AI Native App in Snowflake to make programmatic queries to your agents.
```python
SF_BASE_URL = 'xxxxx-xxxxx-xxxxx.snowflakecomputing.app' # what you will receive from GET_API_ENDPOINT()
BASE_URL = f'https://{SF_BASE_URL}/v1'
ctx = snowflake.connector.connect( # type: ignore
user="",# snowflake account user
password='', # snowflake account password
account="organization-account", # snowflake organization and account -
session_parameters={
'PYTHON_CONNECTOR_QUERY_RESULT_FORMAT': 'json'
})
# Obtain a session token.
token_data = ctx._rest._token_request('ISSUE') # type: ignore
token_extract = token_data['data']['sessionToken'] # type: ignore
# Create a request to the ingress endpoint with authz.
api_key = f'\"{token_extract}\"'
client = ContextualAI(api_key=api_key, base_url=BASE_URL)
# get list of agents to test API
agents = [a for a in client.agents.list()]
```
It is recommended to use the Contextual AI Python SDK to interact with the API. An example script of our Python SDK and Snowflake Native App can be found [here](https://github.com/ContextualAI/contextual-client-python/blob/main/examples/snowflake_native_app_example.py).