Contextual AI Agent Parameters

This article describes the key parameters that are available when configuring RAG agents on the Contextual AI platform. Sensible defaults are applied for all settings, but they can be modified based on your preferences or to optimize performance against your data and query patterns.

Standard Parameters

Datastores (datastore_ids): Datastores are the knowledgebases that your agent can access when answering queries. Files uploaded into a datastore are processed using Contextual’s multi-modal document understanding pipeline, which prepares documents in ways optimized for end-to-end RAG performance. You must link at least one datastore to your agent, but you can specify more.
Generator Model (llm_model_id): Determines which generator model powers your agent. You can use either our default Grounded Langauge Model or a version that has been specifically tuned to your use case. Tuned models can only be used with the agents on which they were tuned.

System Prompts

Note on Adherence: Contextual has built agents to faithfully follow instructions. However, in some cases complete adherence cannot be guaranteed, especially where instructions are unclear, under-specified, or in conflict with other instructions you have given or our guardrails.

The System Prompts instruct the agent on how to respond to users’ queries given the retrieved knowledge. The appropriate prompt is passed, along with the user query and relevant retrievals, to the Generator Model at generation-time.

Core System Prompt (system_prompt): Defines how the agent interprets queries and generates responses. You can provide instructions about the agent’s persona and style, and the desired content and structure of the responses.
Default System Prompt
You are a helpful AI assistant created by Contextual AI to answer questions about relevant documentation provided to you. Your responses should be precise, accurate, and sourced exclusively from the provided information. Please follow these guidelines:* Only use information from the provided documentation. Avoid opinions, speculation, or assumptions.* Use the exact terminology and descriptions found in the provided content.* Keep answers concise and relevant to the user’s question.* Use acronyms and abbreviations exactly as they appear in the documentation or query.* Apply markdown if your response includes lists, tables, or code.* Directly answer the question, then STOP. Avoid additional explanations unless specifically relevant.* If the information is irrelevant, simply respond that you don’t have relevant documentation and do not provide additional comments or suggestions. Ignore anything that cannot be used to directly answer this query.
No Retrieval System Prompt (no_retrieval_system_prompt): Defines the agent’s behavior if, after the retrieval, reranking, and filter steps, no relevant knowledge has been identified. You can use this prompt to define boilerplate refusals, offer help and guidance, provide information about the document store, and specify other contextually-appropriate ways that the agent should respond.
Default No Retrieval System Prompt
You are an AI RAG agent created by Contextual to help answer questions and complete tasks posed by users. Your capabilities include accurately retrieving/reranking information from linked datastores and using these retrievals to generate factual, grounded responses. You are powered by leading document parsing, retrieval, reranking, and grounded generation models. Users can impact the information you have access to by uploading files into your linked datastores. Full documentation, API specs, and guides on how to use Contextual, including agents like yourself, can you found at docs.contextual.ai.In this case, there are no relevant retrievals that can be used to answer the user’s query. This is either because there is no information in the sources to answer the question or because the user is engaging in general chit chat. Respond according to the following guidelines:
- If the user is engaging in general pleasantries (“hi”, “how goes it”, etc.), you can respond in kind. But limit this to only a brief greeting or general acknowledgement
- Your response should center on describing your capabilities and encouraging the user to ask a question that is relevant to the sources you have access to. You can also encourage them to upload relevant documents and files to your linked datastore(s).
- DO NOT answer, muse about, or follow-up on any general questions or asks. DO NOT assume that you have knowledge about any particular topic. DO NOT assume access to any particular source of information.
- DO NOT engage in character play. You must maintain a friendly, professional, and neutral tone at all times

Query Understanding

These settings affect if and how user queries are modified to improve retrieval performance and response generation.

Enable Multi-turn (enable_multi_turn): Allows the agent to remember and reference previous parts of the conversation, making interactions feel more natural and continuous. When enabled, the user’s query will automatically be reformulated based on prior turns to resolve ambiguities. The conversation history is prepended to the query at generation-time.
Check Retrieval Need (should_check_retrieval_need): Enables a check for whether the user query is general chit-chat or an addressable question. If the query is general chit-chat, the intermediate retrieval steps are skipped and the generator is called using the no_retrieval_system_prompt. If it is an addressable question, the RAG pipeline as configured is executed.

Query Reformulation

These settings allow you to modify the original user query prior to retrieval and generation. These strategies can help improve retrieval accuracy or completeness.

Enable Query Expansion (enable_query_expansion): Toggles the query expansion module on or off. When enabled, the user’s original query will be rewritten according to the instructions set out in the prompt, guided by any provided examples. If no prompt or examples are given, the default prompt is used.
- **Instructions: **An optional parameter that specifies how queries should be reformulated.
- Examples: An optional parameter that provides few shot examples of how queries should be reformulated based on the provided instructions.

Default Query Reformulation Prompt

Instructions: Reformulate the query so that it is more detailed and includes relevant terminology or topics that will be helpful in maximizing the quality of the information retrieved to answer the query.

Example 1:
Original query: What are JPMorgan’s results this quarter?
Expanded query: Can you provide the latest financial results for JPMorgan, including revenue, earnings per share, and key metrics for the most recent quarter?

Example 2:
Original query: What is data cleaning?
Expanded query: Could you explain the concept of data cleaning, including common techniques used, typical challenges faced, and its role in the data preprocessing pipeline for machine learning models?

Example 3:
Original query: What are the results of Apple this quarter?
Expanded query: Can you provide the latest financial results for Apple, including revenue, earnings per share, and key metrics for the most recent quarter?

Enable Query Decomposition (enable_query_decomposition): Toggles the query decomposition module on or off. When enabled, the module will break down complex and compound queries into a series of simpler queries. Each sub-query will have individual retrievals, which are then intelligently combined prior to any subsequent reranking, filtering, or generation steps. Simple queries, as judged by the model, will not be decomposed.
- Examples: An optional parameter that provides few shot examples of how an input query should be decomposed into subqueries.

Retrieval

These settings determine how the agent performs the initial retrieval from linked unstructured datastores.

Number of Retrieved Chunks (top_k_retrieved_chunks): The maximum number of chunks that should be retrieved from the linked datastore(s). For example, if set to 10, the agent will retrieve at most 10 relevant chunks to generate its response. The value ranges from 1 to 200, and the default value is 100.
Lexical Search Weight (lexical_alpha): When chunks are scored during retrieval (based on their relevance to the user query), this controls how much weight the scoring gives to exact keyword matches. A higher value means the agent will prioritize finding exact word matches, while a lower value allows for more flexible, meaning-based matching. The value ranges from 0 to 1, and we default to 0.1. You can increase this weight if exact terminology, specific entities, or specialized vocabulary is important for your use case.
Semantic Search Weight (semantic_alpha): When chunks are scored during retrieval (based on their relevance to the user query), this controls how much weight the scoring gives to semantic similarity. Semantic searching looks for text that conveys similar meaning and concepts, rather than just matching exact keywords or phrases. The value ranges from 0 to 1, and we default to 0.9. The value of the semantic search weight and lexical search weight must sum to 1.

Rerank and Filter

These settings affect how the agent reranks and filters chunks before passing them to the generator model.

Enable Reranking (enable_rerank): Allows the agent to take the initially retrieved document chunks and rerank them based on the provided instructions and user query. The top reranked chunks are passed on for filtering and final response generation.
- Rerank instructions (rerank_instructions): Natural language instructions that describe your preferences for the ranking of chunks, such as prioritizing chunks from particular sources or time periods. Chunks will be rescored based on these instructions and the user query. If no instruction is given, the rerank scores are based only on relevance of chunks to the query.
- Number of Reranked Chunks (top_k_reranked_chunks): The number of top reranked chunks that are passed on for generation.
- Reranker score filter (reranker_score_filter_threshold): If the value is set to greater than 0, chunks with relevance scores below the set value will not be passed on. Input must be between 0 and 1.
Enable Filtering (enable_filter): Allows the agent to perform a final filtering step prior to generation. When enabled, chunks are checked against the filter prompt and irrelevant chunks are filtered out. This acts like a final quality control checkpoint, helping to ensure that only relevant chunks are passed to the generator. This filter can improve response accuracy and relevance, but also increase the false refusal rate if the configuration is too strict.
- Filter Prompt (filter_prompt): Natural language instructions that describes the criteria for relevant and irrelevant chunks. It can be used in tandem with, or as an alternative to, the reranker score-based filtering.

Generate

These settings affect how the generator model produces responses.

Max New Tokens (max_new_tokens): Controls the maximum length of the agent’s response. Defaults to 2,048 tokens.
Temperature (temperature): Controls how creative the agent’s responses are. A higher temperature means more creative and varied responses, while a lower temperature results in more consistent, predictable answers. It ranges from 0 to 1. Defaults to 0.
Top P (top_p): Similar to temperature, this parameter also controls response variety. It determines how many different word choices the agent considers when generating its response. Defaults to 0.9.
Frequency Penalty (frequency_penalty): Helps prevent repetition in responses by making the agent less likely to use words it has already used frequently. This helps ensure more natural, varied language. Defaults to 0.
Random Seed (seed): Controls randomness in how the agent selects the next tokens during text generation. Allows for reproducible results, which can be useful for testing.
Enable Groundedness Scores (calculate_groundedness). Enables the agent to provide groundedness scores as part of its response. When enabled, the agent identifies distinct claims in the response and assesses whether each one is grounded in the retrieved document chunks. Claims that are not grounded are shown in yellow in the UI. Defaults off.
Disable Commentary (avoid_commentary): Flag that indicates whether the Agent should only output strictly factual information grounded in the retrieved knowledge, instead of the complete response (which can include commentary, analysis, etc.).

Miscellaneous

Suggested queries (suggested_queries): Example queries that appear in the user interface when first interacting with the agent. You can provide both simple and complex examples to help users understand the full range of questions your agent can handle. This helps set user expectations and guides them toward effective interactions with the agent.

Getting Started

Resources

SDKs

Standard Parameters

System Prompts

Query Understanding

Query Reformulation

Retrieval

Rerank and Filter

Generate

Miscellaneous

Getting Started

Resources

SDKs

​Standard Parameters

​System Prompts

​Query Understanding

​Query Reformulation

​Retrieval

​Rerank and Filter

​Generate

​Miscellaneous

Standard Parameters

System Prompts

Query Understanding

Query Reformulation

Retrieval

Rerank and Filter

Generate

Miscellaneous