Key concepts to understand when working with the Contextual APIs.


Retrieval Augmented Generation or RAG is a technique that improves language model generation by incorporating external knowledge. Contextual Agents use RAG to ground its responses in directly relevant information, ensuring accuracy for knowledge-intensive tasks. We’ve pioneered the RAG 2.0 approach, which outperforms traditional RAG systems by optimizing the system end-to-end. Read more in our blog post.


Contextual RAG Agents - powered by our RAG 2.0 technology - are optimized end-to-end to deliver exceptional accuracy on complex and knowledge-intensive tasks. Agents make intelligent decisions on how to accomplish the tasks, and can take multiple steps to do so. The agentic approach enables a wide range of actions, such as providing standard retrieval-based answers, declining to respond when no relevant information is available, or generating and executing SQL queries when working with structured data. The adaptability and further tuning of Agents greatly increases its value for knowledge-intensive tasks.

Query / Prompt

The question that you submit to an Agent . You can submit a Query to your Agent via our API.


Response is the output generated by an Agent in response to a Query. Responses come with the relevant retrieved content (Knowledge) and in-line citations (Attributions).


The data retrieved by the Agent from the Datastore to generate its response. When working with unstructured data, Knowledge comes in the form of a list of Document chunks that are relevant to the Query.


A Case is a row of data. It is either a Prompt and Reference (gold-standard answer) pair, or a Prompt, Response, and Knowledge triplet. Evaluation datasets follow the former schema, while Tuning datasets require the latter.


Attributions are in-line citations that credit the specific sources of information used by the model to generate a response. When querying Contextual Agents, Attributions are included for each claim made in the response. These attributions can be accessed via the query API response or viewed in the UI by hovering over in-line tooltips next to each claim (e.g., [1], [2]).

System Prompt

Instructions that guide an Agent’s response generation, helping define its behavior and capabilities. You can set and modify the System Prompt when creating or editing an Agent via our APIs.


A Document is a unit of unstructured data ingested into a Datastore, which can be queried and used as the basis for generating responses. Today, we support both pdf and html files, and plan to expand support to other data types. You can ingest Documents into a Datastore via our API. After ingestion, Documents are automatically parsed, chunked, and processed by our platform.


A repository of data associated with an Agent. An Agent retrieves relevant data from its associated Datastores to generate responses. An Agent can connect to multiple Datastores, and each Datastore can serve multiple Agents. You can associate a Datastore with an Agent when creating or editing an Agent via our APIs. We also provide a set of APIs for creating and managing Datastores.


The Dataset object can be used to store labelled data cases. A case is either a (i) Prompt-Reference pair, or a (ii) Prompt-Reference-Knowledge triplet. Datasets can be used for Evaluation or Tuning. You can create a new Dataset by uploading a CSV or JSONL file via our API.

The Dataset object can also store evaluation results. Once an evaluation job is completed, it returns a Dataset containing the original Cases from the evaluation, now appended with results such as Equivalence and Groundedness scores for each Case.


You can evaluate your Contextual Agents in two ways:

Firstly, you can assess the Agent’s performance using either a Dataset or an evaluation file. The evaluation file is a set of Prompts (questions) and References (gold-standard answers).

You can create and manage Evaluation jobs via the API. Once Evaluation is complete, you can view evaluation metrics and the row-by-row results. The latter is stored in a new Dataset object (read more). We currently support two evaluation metrics:

  • Equivalence: Uses a Language Model as a judge to evaluate if the Agent’s response is equivalent to the Reference (or gold-standard answer).
  • Groundedness: Decomposes the Agent’s response into claims, and then uses a Language Model to evaluate if the claims are grounded in the retrieved documents.

Secondly, you can write natural language unit tests to assess specific criteria in an Agent’s responses. See the LMUnit section below for more information.


An evaluation method using natural language unit tests to assess specific criteria in an Agent's responses. You can define and evaluate clear, testable statements or questions that capture desirable fine-grained qualities of the Agent’s response – such as “Is the response succinct without omitting essential information?” or “Is the complexity of the response appropriate for the intended audience?” You can create and run these unit tests via our API. Read more about LMUnit in our blog post.


Reference is the gold-standard answer used in Evaluation to compare against an Agent’s generated response. When launching an Evaluation job, you must provide a dataset that has a column for Reference (read more).


The process of specializing an Agent to improve its performance on specific tasks or domains while aligning it with your unique requirements. Tuning enables tailored optimization to better meet your use case. You can create and manage tuning jobs through the API.

Once tuning is complete, the tuned model can be deployed to an Agent using the Edit Agent API. Currently, only one tuned model can be deployed per tenant at a time.


An organizational unit that owns and manages Agents, Datastores, and other resources within the system. Contextual AI uses Tenants to organize and manage resources, with API keys associated with specific tenants.