Create Evaluation
Launch an Evaluation
job which evaluates an Agent
on a set of test questions and reference answers.
An Evaluation
is an asynchronous operation. Users can select one or more metrics to assess the quality of generated answers. These metrics include equivalence
and groundedness
. equivalence
evaluates if the Agent response is equivalent to the ground truth (model-driven binary classification). groundedness
decomposes the Agent response into claims and then evaluates if the claims are grounded by the retrieved documents.
Evaluation
data can be provided in one of two forms:
-
A CSV
evalset_file
containing the columnsprompt
(i.e. questions) andreference
(i.e. gold-answers). -
An
evalset_name
which refers to aDataset
created through the/datasets/evaluate
API.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
Agent ID of the agent to evaluate
Body
List of metrics to use. Supported metrics are equivalence
and groundedness
.
equivalence
, groundedness
Evalset file (CSV) to use for evaluation, containing the columns prompt
(i.e. question) and reference
(i.e. ground truth response). Either evalset_name
or evalset_file
must be provided, but not both.
Name of the Dataset to use for evaluation, created through the /datasets/evaluate
API. Either evalset_name
or evalset_file
must be provided, but not both.
ID of the model to evaluate. Uses the default model if not specified.
Response
Response from Launch Evaluation request
ID of the launched evaluation
Was this page helpful?