Evaluation
any_agent.evaluation.LlmJudge
Source code in src/any_agent/evaluation/llm_judge.py
run(context, question, prompt_template=DEFAULT_PROMPT_TEMPLATE)
Run the judge synchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
str
|
Any relevant information that may be needed to answer the question |
required |
question
|
str
|
The question to ask the agent |
required |
prompt_template
|
str
|
The prompt to use for the LLM |
DEFAULT_PROMPT_TEMPLATE
|
Returns:
Type | Description |
---|---|
BaseModel
|
The evaluation result |
Source code in src/any_agent/evaluation/llm_judge.py
run_async(context, question, prompt_template=DEFAULT_PROMPT_TEMPLATE)
async
Run the LLM asynchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
str
|
Any relevant information that may be needed to answer the question |
required |
question
|
str
|
The question to ask the agent |
required |
prompt_template
|
str
|
The prompt to use for the LLM |
DEFAULT_PROMPT_TEMPLATE
|
Returns:
Type | Description |
---|---|
BaseModel
|
The evaluation result |
Source code in src/any_agent/evaluation/llm_judge.py
any_agent.evaluation.AgentJudge
An agent that evaluates the correctness of another agent's trace.
Source code in src/any_agent/evaluation/agent_judge.py
run(trace, question, additional_tools=None)
Run the agent judge.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trace
|
AgentTrace
|
The agent trace to evaluate |
required |
question
|
str
|
The question to ask the agent |
required |
additional_tools
|
list[Callable[[], Any]] | None
|
Additional tools to use for the agent |
None
|
Returns:
Type | Description |
---|---|
BaseModel
|
The evaluation result |
Source code in src/any_agent/evaluation/agent_judge.py
run_async(trace, question, additional_tools=None)
async
Run the agent judge asynchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trace
|
AgentTrace
|
The agent trace to evaluate |
required |
question
|
str
|
The question to ask the agent |
required |
additional_tools
|
list[Callable[[], Any]] | None
|
Additional tools to use for the agent |
None
|
Returns: The evaluation result