Launch an Evaluation job which evaluates an Agent on a set of test questions and reference answers.

An Evaluation is an asynchronous operation. Users can select one or more metrics to assess the quality of generated answers. These metrics include equivalence and groundedness. equivalence evaluates if the Agent response is equivalent to the ground truth (model-driven binary classification). groundedness decomposes the Agent response into claims and then evaluates if the claims are grounded by the retrieved documents.

Evaluation data can be provided in one of two forms:

  • A CSV evalset_file containing the columns prompt (i.e. questions) and reference (i.e. gold-answers).

  • An evalset_name which refers to a Dataset created through the /datasets/evaluate API.

Language
Credentials
Bearer
API Key
Click Try It! to start a request and see the response here!