Generate
Generate a response using Contextual’s Grounded Language Model (GLM), an LLM engineered specifically to prioritize faithfulness to in-context retrievals over parametric knowledge to reduce hallucinations in Retrieval-Augmented Generation and agentic use cases.
The total request cannot exceed 32,000 tokens.
See our blog post and code examples. Email glm-feedback@contextual.ai with any feedback or questions.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
/generate input request object.
The version of the Contextual's GLM to use. Currently, we just have "v1".
List of messages in the conversation so far. The last message must be from the user.
Message object for a message received in the /generate request
The knowledge sources the model can use when generating a response.
Instructions that the model follows when generating responses. Note that we do not guarantee that the model follows these instructions exactly.
Flag to indicate whether the model should avoid providing additional commentary in responses. Commentary is conversational in nature and does not contain verifiable claims; therefore, commentary is not strictly grounded in available context. However, commentary may provide useful context which improves the helpfulness of responses.
The sampling temperature, which affects the randomness in the response. Note that higher temperature values can reduce groundedness.
0 <= x <= 1
A parameter for nucleus sampling, an alternative to temperature which also affects the randomness of the response. Note that higher top_p values can reduce groundedness.
0 < x <= 1
The maximum number of tokens that the model can generate in the response.
1 <= x <= 2048
Response
/generate result object.
The model's response to the last user message.
Was this page helpful?