Key concepts to understand when working with the Contextual APIs.
RAG
Retrieval Augmented Generation or RAG
is a technique that improves language model generation by incorporating external knowledge. Contextual Agents use RAG
to ground its responses in directly relevant information, ensuring accuracy for knowledge-intensive tasks. We’ve pioneered the RAG 2.0
approach, which outperforms traditional RAG
systems by optimizing the system end-to-end. Read more in our blog post.
Agent
Contextual RAG Agents - powered by our RAG 2.0 technology - are optimized end-to-end to deliver exceptional accuracy on complex and knowledge-intensive tasks. Agents
make intelligent decisions on how to accomplish the tasks, and can take multiple steps to do so. The agentic approach enables a wide range of actions, such as providing standard retrieval-based answers, declining to respond when no relevant information is available, or generating and executing SQL queries when working with structured data. The adaptability and further tuning of Agents
greatly increases its value for knowledge-intensive tasks.
Query / Prompt
The question that you submit to an Agent
. You can submit a Query
to your Agent
via our API.
Response
Response
is the output generated by an Agent
in response to a Query
. Responses
come with the relevant retrieved content (Knowledge
) and in-line citations (Attributions
).
Knowledge
The data retrieved by the Agent
from the Datastore
to generate its response. When working with unstructured data, Knowledge
comes in the form of a list of Document
chunks that are relevant to the Query
.
Case
A Case
is a row of data. It is either a Prompt
and Reference
(gold-standard answer) pair, or a Prompt
, Response
, and Knowledge
triplet. Evaluation
datasets follow the former schema, while Tuning
datasets require the latter.
Attribution
Attributions
are in-line citations that credit the specific sources of information used by the model to generate a response. When querying Contextual Agents, Attributions
are included for each claim made in the response. These attributions can be accessed via the query API response or viewed in the UI by hovering over in-line tooltips next to each claim (e.g., [1], [2]).
System Prompt
Instructions that guide an Agent’s response generation, helping define its behavior and capabilities. You can set and modify the System Prompt
when creating or editing an Agent via our APIs.
Document
A Document
is a unit of unstructured data ingested into a Datastore
, which can be queried and used as the basis for generating responses. Today, we support both pdf
and html
files, and plan to expand support to other data types. You can ingest Documents
into a Datastore
via our API. After ingestion, Documents
are automatically parsed, chunked, and processed by our platform.
Datastore
A repository of data associated with an Agent
. An Agent
retrieves relevant data from its associated Datastores
to generate responses. An Agent
can connect to multiple Datastores
, and each Datastore
can serve multiple Agents
. You can associate a Datastore
with an Agent
when creating or editing an Agent via our APIs. We also provide a set of APIs for creating and managing Datastores
.
Dataset
The Dataset
object can be used to store labelled data cases
. A case
is either a (i) Prompt
-Reference
pair, or a (ii) Prompt
-Reference
-Knowledge
triplet. Datasets
can be used for Evaluation
or Tuning
. You can create a new Dataset
by uploading a CSV or JSONL file via our API.
The Dataset
object can also store evaluation results. Once an evaluation job is completed, it returns a Dataset
containing the original Cases
from the evaluation, now appended with results such as Equivalence
and Groundedness
scores for each Case
.
Evaluation
You can evaluate your Contextual Agents in two ways:
Firstly, you can assess the Agent’s performance using either a Dataset
or an evaluation file. The evaluation file is a set of Prompts
(questions) and References
(gold-standard answers).
You can create and manage Evaluation
jobs via the API. Once Evaluation
is complete, you can view evaluation metrics and the row-by-row results. The latter is stored in a new Dataset
object (read more). We currently support two evaluation metrics:
Equivalence
: Uses a Language Model as a judge to evaluate if the Agent’s response is equivalent to theReference
(or gold-standard answer).Groundedness
: Decomposes the Agent’s response into claims, and then uses a Language Model to evaluate if the claims are grounded in the retrieved documents.
Secondly, you can write natural language unit tests to assess specific criteria in an Agent’s responses. See the LMUnit
section below for more information.
LMUnit
An evaluation method using natural language unit tests to assess specific criteria in an Agent's responses. You can define and evaluate clear, testable statements or questions that capture desirable fine-grained qualities of the Agent’s response – such as “Is the response succinct without omitting essential information?” or “Is the complexity of the response appropriate for the intended audience?” You can create and run these unit tests via our API. Read more about LMUnit
in our blog post.
Reference
Reference
is the gold-standard answer used in Evaluation
to compare against an Agent’s generated response. When launching an Evaluation
job, you must provide a dataset that has a column for Reference
(read more).
Tuning
The process of specializing an Agent
to improve its performance on specific tasks or domains while aligning it with your unique requirements. Tuning
enables tailored optimization to better meet your use case. You can create and manage tuning jobs through the API.
Once tuning is complete, the tuned model can be deployed to an Agent
using the Edit Agent API. Currently, only one tuned model can be deployed per tenant at a time.
Tenant
An organizational unit that owns and manages Agents
, Datastores
, and other resources within the system. Contextual AI uses Tenants
to organize and manage resources, with API keys associated with specific tenants.