Local Agent¶
(no Google Colab for this one since it is meant to be run entirely locally)
This tutorial will guide on how to run agent, fully locally / offline i.e., running with a local LLM and a couple of local tools (Callable Python functions), so no data will leave your machine!
This can be especially useful for privacy-sensitive applications or when you want to avoid any cloud dependencies.
In this example, we will showcase how to let an agent read and write in your local filesystem! Specifically, we will give read-access to the agent to some files in our codebase and ask it to create a short report of potential vulnerabilities or bugs in the codebase, and write it to a file.
Install Dependencies¶
any-agent uses the python asyncio module to support async functionality. When running in Jupyter notebooks, this means we need to enable the use of nested event loops. We'll install any-agent and enable this below using nest_asyncio.
%pip install 'any-agent' --quiet
import nest_asyncio
nest_asyncio.apply()
/home/daavoo/any-agent/.venv/bin/python: No module named pip Note: you may need to restart the kernel to use updated packages.
Set up your own LLM locally¶
Regardless of which agent framework you choose in any-agent, all of them support LiteLLM, which is a proxy that allows us to use whichever LLM inside the framework, hosted on by any provider. For example, we could use a local model via llama.cpp or llamafile, a google hosted gemini model, or a AWS bedrock hosted Llama model. For this example, we will use Ollama to run our LLM locally!
Ollama setup¶
First, install Ollama by following their instructions: https://ollama.com/download
Picking an LLM¶
Pick a model that you can run locally based on your hardware and download it from your terminal. For example:
16-24GB RAM -> granite3.3:8b
or gemma3:12b
with ~10–20k context length
24+GB RAM -> mistral-small3.2:24b
or devstral:24b
with ~20k+ context length
NOTE: Smaller models have shown to be more inconsistent and less capable, especially as task complexity increases. If you have the hardware, we highly recommend using the larger models like mistral-small3.2:24b
for better performance.
Serving the model with the appropriate context length¶
By default, Ollama forces a context length of 8192 tokens to all models, which is not enough for our agent to work properly.
We can simply pass to the AgentConfig of any-agent
as a model argument (num_ctx
is ollama-specific) our desired value of context length like so:
AgentConfig(
model_id="ollama/granite3.3:8b",
instructions="You must use the available tools to solve the task.",
tools=[mcp_filesystem, show_plan],
model_args={"num_ctx": 14000},
References: AgentConfig, num_ctx
All four of the models above have a max context length of 128k tokens, but if you have limited RAM if you set it to 128k it might cause you memory issues. For this example, we will set it to 14.000 tokens and provide a relatively small codebase.
Load the model¶
Firstly, download the Ollama model locally:
ollama pull granite3.3:8b
Configure the Agent and the Tools¶
Pick which tools to use¶
Since we want our agent to work fully locally/offline, we will not add any tools that require communication with remote servers. In this example we are using python callable functions as tools, but we could also have used MCP servers that run fully locally (e.g. mcp/filesystem)
def read_file(file_path: str) -> str:
"""
Read a file from the file path given and return its content.
Args:
file_path: The path of the file to read.
"""
try:
with open(file_path) as file:
content = file.read()
except Exception as e:
content = f"Error reading file: {file_path} \n\n {e}"
return content
Now that you have downloaded your LLM and you have defined your tools, you need to pick your agent framework to build your agent. Note that the agent you'll build with any-agent can be run across multiple agent frameworks (Smolagent, TinyAgent, OpenAI, etc) and across various LLMs (Llama, DeepSeek, Mistral, etc). For this example, we will use the smolagents framework.
import os
from any_agent import AgentConfig, AnyAgent
if os.getenv("IN_PYTEST") == "1":
# Don't need to worry about this when running locally.
# We use this for out automated testing https://github.com/mozilla-ai/any-agent/blob/main/.github/workflows/tests-cookbook.yaml
# HF_ENDPOINT points to a Qwen/Qwen3-1.7B running in https://endpoints.huggingface.co/
# See https://docs.litellm.ai/docs/providers/huggingface#dedicated-inference-endpoints for more info.
model_id = "huggingface/tgi"
model_args = {"api_base": os.environ["HF_ENDPOINT"]}
else:
model_id = "ollama/granite3.3:8b"
model_args = {"num_ctx": 32000}
agent = AnyAgent.create(
"tinyagent",
AgentConfig(
model_id=model_id,
instructions="You are an expert summarizer.",
tools=[read_file],
model_args=model_args,
),
)
/home/daavoo/any-agent/.venv/lib/python3.13/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.29.3 is exactly one major version older than the runtime version 6.31.1 at a2a.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/daavoo/any-agent/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Run the Agent¶
from pathlib import Path
abs_path = Path("../../demo/app.py").resolve()
agent_trace = agent.run(
f"Carefully read the file in the path: {abs_path} and return summary of what the code does."
)
╭─────────────────────────────────────────── CALL_LLM: huggingface/tgi ───────────────────────────────────────────╮ │ ╭─ INPUT ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ [ │ │ │ │ { │ │ │ │ "role": "system", │ │ │ │ "content": "You are an expert summarizer." │ │ │ │ }, │ │ │ │ { │ │ │ │ "role": "user", │ │ │ │ "content": "Carefully read the file in the path: /home/daavoo/any-agent/demo/app.py and return summary │ │ │ │ } │ │ │ │ ] │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ OUTPUT ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ [ │ │ │ │ { │ │ │ │ "tool.name": "read_file", │ │ │ │ "tool.args": "{\"file_path\":\"/home/daavoo/any-agent/demo/app.py\"}" │ │ │ │ } │ │ │ │ ] │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ USAGE ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ { │ │ │ │ "input_tokens": 311, │ │ │ │ "output_tokens": 30, │ │ │ │ "input_cost": 0, │ │ │ │ "output_cost": 0 │ │ │ │ } │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────── EXECUTE_TOOL: read_file ────────────────────────────────────────────╮ │ ╭─ Input ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ { │ │ │ │ "file_path": "/home/daavoo/any-agent/demo/app.py" │ │ │ │ } │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ OUTPUT ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ import asyncio │ │ │ │ │ │ │ │ import nest_asyncio import streamlit as st from components.sidebar import ssf_sidebar from constants import │ │ │ │ DEFAULT_TOOLS from services.agent import ( configure_agent, display_evaluation_results, display_output, │ │ │ │ evaluate_agent, run_agent, ) │ │ │ │ │ │ │ │ nest_asyncio.apply() │ │ │ │ │ │ │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │ │ │ ┃ Set page config ┃ │ │ │ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │ │ │ │ │ │ │ │ st.set_page_config(page_title="Surf Spot Finder", page_icon="🏄", layout="wide") │ │ │ │ │ │ │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │ │ │ ┃ Allow a user to resize the sidebar to take up most of the screen to make editing eval cases easier ┃ │ │ │ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ st.markdown( """ /* When sidebar is expanded, adjust main content */ │ │ │ │ section[data-testid="stSidebar"][aria-expanded="true"] { max-width: 99% !important; } """, │ │ │ │ unsafe_allow_html=True, ) │ │ │ │ │ │ │ │ with st.sidebar: user_inputs = ssf_sidebar() is_valid = user_inputs is not None run_button = st.button("Run │ │ │ │ Agent 🤖", disabled=not is_valid, type="primary") │ │ │ │ │ │ │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │ │ │ ┃ Main content ┃ │ │ │ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │ │ │ │ │ │ │ │ async def main(): # Handle agent execution button click if run_button: agent, agent_config = await │ │ │ │ configure_agent(user_inputs) agent_trace = await run_agent(agent, agent_config) │ │ │ │ │ │ │ │ │ │ │ │ await display_output(agent_trace) │ │ │ │ │ │ │ │ evaluation_result = await evaluate_agent(agent_config, agent_trace) │ │ │ │ │ │ │ │ await display_evaluation_results(evaluation_result) │ │ │ │ else: │ │ │ │ st.title("🏄 Surf Spot Finder") │ │ │ │ st.markdown( │ │ │ │ "Find the best surfing spots based on your location and preferences! [Github │ │ │ │ Repo](https://github.com/mozilla-ai/surf-spot-finder)" │ │ │ │ ) │ │ │ │ st.info( │ │ │ │ "👈 Configure your search parameters in the sidebar and click Run to start!" │ │ │ │ ) │ │ │ │ │ │ │ │ # Display tools in a more organized way │ │ │ │ st.markdown("### 🛠️ Available Tools") │ │ │ │ │ │ │ │ st.markdown(""" │ │ │ │ The AI Agent built for this project has a few tools available for use in order to find the perfect │ │ │ │ surf spot. │ │ │ │ The agent is given the freedom to use (or not use) these tools in order to accomplish the task. │ │ │ │ """) │ │ │ │ │ │ │ │ weather_tools = [ │ │ │ │ tool │ │ │ │ for tool in DEFAULT_TOOLS │ │ │ │ if "forecast" in tool.__name__ or "weather" in tool.__name__ │ │ │ │ ] │ │ │ │ for tool in weather_tools: │ │ │ │ with st.expander(f"🌤️ {tool.__name__}"): │ │ │ │ st.markdown(tool.__doc__ or "No description available") │ │ │ │ location_tools = [ │ │ │ │ tool │ │ │ │ for tool in DEFAULT_TOOLS │ │ │ │ if "lat" in tool.__name__ │ │ │ │ or "lon" in tool.__name__ │ │ │ │ or "area" in tool.__name__ │ │ │ │ ] │ │ │ │ for tool in location_tools: │ │ │ │ with st.expander(f"📍 {tool.__name__}"): │ │ │ │ st.markdown(tool.__doc__ or "No description available") │ │ │ │ │ │ │ │ web_tools = [ │ │ │ │ tool │ │ │ │ for tool in DEFAULT_TOOLS │ │ │ │ if "web" in tool.__name__ or "search" in tool.__name__ │ │ │ │ ] │ │ │ │ for tool in web_tools: │ │ │ │ with st.expander(f"🌐 {tool.__name__}"): │ │ │ │ st.markdown(tool.__doc__ or "No description available") │ │ │ │ │ │ │ │ # add a check that all tools were listed │ │ │ │ if len(weather_tools) + len(location_tools) + len(web_tools) != len( │ │ │ │ DEFAULT_TOOLS │ │ │ │ ): │ │ │ │ st.warning( │ │ │ │ "Some tools are not listed. Please check the code for more details." │ │ │ │ ) │ │ │ │ │ │ │ │ # Add Custom Evaluation explanation section │ │ │ │ st.markdown("### 📊 Custom Evaluation") │ │ │ │ st.markdown(""" │ │ │ │ The Surf Spot Finder includes a powerful evaluation system that allows you to customize how the │ │ │ │ agent's performance is assessed. │ │ │ │ You can find these settings in the sidebar under the "Custom Evaluation" expander. │ │ │ │ """) │ │ │ │ │ │ │ │ with st.expander("Learn more about Custom Evaluation"): │ │ │ │ st.markdown(""" │ │ │ │ #### What is Custom Evaluation? │ │ │ │ The Custom Evaluation feature uses an LLM-as-a-Judge approach to evaluate how well the agent │ │ │ │ performs its task. │ │ │ │ An LLM will be given the complete agent trace (not just the final answer), and will assess the │ │ │ │ agent's performance based on the criteria you set. │ │ │ │ You can customize: │ │ │ │ │ │ │ │ - **Evaluation Model**: Choose which LLM should act as the judge │ │ │ │ - **Evaluation Criteria**: Define specific checkpoints that the agent should meet │ │ │ │ - **Scoring System**: Assign points to each criterion │ │ │ │ │ │ │ │ #### How to Use Custom Evaluation │ │ │ │ │ │ │ │ 1. **Select an Evaluation Model**: Choose which LLM you want to use as the judge │ │ │ │ 2. **Edit Checkpoints**: Use the data editor to: │ │ │ │ - Add new evaluation criteria │ │ │ │ - Modify existing criteria │ │ │ │ - Adjust point values │ │ │ │ - Remove criteria you don't want to evaluate │ │ │ │ │ │ │ │ #### Example Criteria │ │ │ │ You can evaluate things like: │ │ │ │ - Tool usage and success │ │ │ │ - Order of operations │ │ │ │ - Quality of final recommendations │ │ │ │ - Response completeness │ │ │ │ - Number of steps taken │ │ │ │ │ │ │ │ #### Tips for Creating Good Evaluation Criteria │ │ │ │ - Be specific about what you want to evaluate │ │ │ │ - Use clear, unambiguous language │ │ │ │ - Consider both process (how the agent works) and outcome (what it produces) │ │ │ │ - Assign appropriate point values based on importance │ │ │ │ │ │ │ │ The evaluation results will be displayed after each agent run, showing how well the agent met │ │ │ │ your custom criteria. │ │ │ │ """) │ │ │ │ │ │ │ │ │ │ │ │ if name == "main": loop = asyncio.new_event_loop() loop.run_until_complete(main()) │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────── CALL_LLM: huggingface/tgi ───────────────────────────────────────────╮ │ ╭─ OUTPUT ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ [ │ │ │ │ { │ │ │ │ "tool.name": "read_file", │ │ │ │ "tool.args": "{\"file_path\":\"/home/daavoo/any-agent/demo/app.py\"}" │ │ │ │ } │ │ │ │ ] │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ USAGE ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ { │ │ │ │ "input_tokens": 1525, │ │ │ │ "output_tokens": 29, │ │ │ │ "input_cost": 0, │ │ │ │ "output_cost": 0 │ │ │ │ } │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────── EXECUTE_TOOL: read_file ────────────────────────────────────────────╮ │ ╭─ Input ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ { │ │ │ │ "file_path": "/home/daavoo/any-agent/demo/app.py" │ │ │ │ } │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ OUTPUT ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ import asyncio │ │ │ │ │ │ │ │ import nest_asyncio import streamlit as st from components.sidebar import ssf_sidebar from constants import │ │ │ │ DEFAULT_TOOLS from services.agent import ( configure_agent, display_evaluation_results, display_output, │ │ │ │ evaluate_agent, run_agent, ) │ │ │ │ │ │ │ │ nest_asyncio.apply() │ │ │ │ │ │ │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │ │ │ ┃ Set page config ┃ │ │ │ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │ │ │ │ │ │ │ │ st.set_page_config(page_title="Surf Spot Finder", page_icon="🏄", layout="wide") │ │ │ │ │ │ │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │ │ │ ┃ Allow a user to resize the sidebar to take up most of the screen to make editing eval cases easier ┃ │ │ │ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ st.markdown( """ /* When sidebar is expanded, adjust main content */ │ │ │ │ section[data-testid="stSidebar"][aria-expanded="true"] { max-width: 99% !important; } """, │ │ │ │ unsafe_allow_html=True, ) │ │ │ │ │ │ │ │ with st.sidebar: user_inputs = ssf_sidebar() is_valid = user_inputs is not None run_button = st.button("Run │ │ │ │ Agent 🤖", disabled=not is_valid, type="primary") │ │ │ │ │ │ │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │ │ │ ┃ Main content ┃ │ │ │ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │ │ │ │ │ │ │ │ async def main(): # Handle agent execution button click if run_button: agent, agent_config = await │ │ │ │ configure_agent(user_inputs) agent_trace = await run_agent(agent, agent_config) │ │ │ │ │ │ │ │ │ │ │ │ await display_output(agent_trace) │ │ │ │ │ │ │ │ evaluation_result = await evaluate_agent(agent_config, agent_trace) │ │ │ │ │ │ │ │ await display_evaluation_results(evaluation_result) │ │ │ │ else: │ │ │ │ st.title("🏄 Surf Spot Finder") │ │ │ │ st.markdown( │ │ │ │ "Find the best surfing spots based on your location and preferences! [Github │ │ │ │ Repo](https://github.com/mozilla-ai/surf-spot-finder)" │ │ │ │ ) │ │ │ │ st.info( │ │ │ │ "👈 Configure your search parameters in the sidebar and click Run to start!" │ │ │ │ ) │ │ │ │ │ │ │ │ # Display tools in a more organized way │ │ │ │ st.markdown("### 🛠️ Available Tools") │ │ │ │ │ │ │ │ st.markdown(""" │ │ │ │ The AI Agent built for this project has a few tools available for use in order to find the perfect │ │ │ │ surf spot. │ │ │ │ The agent is given the freedom to use (or not use) these tools in order to accomplish the task. │ │ │ │ """) │ │ │ │ │ │ │ │ weather_tools = [ │ │ │ │ tool │ │ │ │ for tool in DEFAULT_TOOLS │ │ │ │ if "forecast" in tool.__name__ or "weather" in tool.__name__ │ │ │ │ ] │ │ │ │ for tool in weather_tools: │ │ │ │ with st.expander(f"🌤️ {tool.__name__}"): │ │ │ │ st.markdown(tool.__doc__ or "No description available") │ │ │ │ location_tools = [ │ │ │ │ tool │ │ │ │ for tool in DEFAULT_TOOLS │ │ │ │ if "lat" in tool.__name__ │ │ │ │ or "lon" in tool.__name__ │ │ │ │ or "area" in tool.__name__ │ │ │ │ ] │ │ │ │ for tool in location_tools: │ │ │ │ with st.expander(f"📍 {tool.__name__}"): │ │ │ │ st.markdown(tool.__doc__ or "No description available") │ │ │ │ │ │ │ │ web_tools = [ │ │ │ │ tool │ │ │ │ for tool in DEFAULT_TOOLS │ │ │ │ if "web" in tool.__name__ or "search" in tool.__name__ │ │ │ │ ] │ │ │ │ for tool in web_tools: │ │ │ │ with st.expander(f"🌐 {tool.__name__}"): │ │ │ │ st.markdown(tool.__doc__ or "No description available") │ │ │ │ │ │ │ │ # add a check that all tools were listed │ │ │ │ if len(weather_tools) + len(location_tools) + len(web_tools) != len( │ │ │ │ DEFAULT_TOOLS │ │ │ │ ): │ │ │ │ st.warning( │ │ │ │ "Some tools are not listed. Please check the code for more details." │ │ │ │ ) │ │ │ │ │ │ │ │ # Add Custom Evaluation explanation section │ │ │ │ st.markdown("### 📊 Custom Evaluation") │ │ │ │ st.markdown(""" │ │ │ │ The Surf Spot Finder includes a powerful evaluation system that allows you to customize how the │ │ │ │ agent's performance is assessed. │ │ │ │ You can find these settings in the sidebar under the "Custom Evaluation" expander. │ │ │ │ """) │ │ │ │ │ │ │ │ with st.expander("Learn more about Custom Evaluation"): │ │ │ │ st.markdown(""" │ │ │ │ #### What is Custom Evaluation? │ │ │ │ The Custom Evaluation feature uses an LLM-as-a-Judge approach to evaluate how well the agent │ │ │ │ performs its task. │ │ │ │ An LLM will be given the complete agent trace (not just the final answer), and will assess the │ │ │ │ agent's performance based on the criteria you set. │ │ │ │ You can customize: │ │ │ │ │ │ │ │ - **Evaluation Model**: Choose which LLM should act as the judge │ │ │ │ - **Evaluation Criteria**: Define specific checkpoints that the agent should meet │ │ │ │ - **Scoring System**: Assign points to each criterion │ │ │ │ │ │ │ │ #### How to Use Custom Evaluation │ │ │ │ │ │ │ │ 1. **Select an Evaluation Model**: Choose which LLM you want to use as the judge │ │ │ │ 2. **Edit Checkpoints**: Use the data editor to: │ │ │ │ - Add new evaluation criteria │ │ │ │ - Modify existing criteria │ │ │ │ - Adjust point values │ │ │ │ - Remove criteria you don't want to evaluate │ │ │ │ │ │ │ │ #### Example Criteria │ │ │ │ You can evaluate things like: │ │ │ │ - Tool usage and success │ │ │ │ - Order of operations │ │ │ │ - Quality of final recommendations │ │ │ │ - Response completeness │ │ │ │ - Number of steps taken │ │ │ │ │ │ │ │ #### Tips for Creating Good Evaluation Criteria │ │ │ │ - Be specific about what you want to evaluate │ │ │ │ - Use clear, unambiguous language │ │ │ │ - Consider both process (how the agent works) and outcome (what it produces) │ │ │ │ - Assign appropriate point values based on importance │ │ │ │ │ │ │ │ The evaluation results will be displayed after each agent run, showing how well the agent met │ │ │ │ your custom criteria. │ │ │ │ """) │ │ │ │ │ │ │ │ │ │ │ │ if name == "main": loop = asyncio.new_event_loop() loop.run_until_complete(main()) │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────── CALL_LLM: huggingface/tgi ───────────────────────────────────────────╮ │ ╭─ OUTPUT ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ [ │ │ │ │ { │ │ │ │ "tool.name": "final_answer", │ │ │ │ "tool.args": "{\"answer\":\"The provided code is a Streamlit application for the Surf Spot Finder, an A │ │ │ │ } │ │ │ │ ] │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ USAGE ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ { │ │ │ │ "input_tokens": 2739, │ │ │ │ "output_tokens": 84, │ │ │ │ "input_cost": 0, │ │ │ │ "output_cost": 0 │ │ │ │ } │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────── EXECUTE_TOOL: final_answer ───────────────────────────────────────────╮ │ ╭─ Input ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ { │ │ │ │ "answer": "The provided code is a Streamlit application for the Surf Spot Finder, an AI-powered tool desi │ │ │ │ } │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ ╭─ OUTPUT ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ The provided code is a Streamlit application for the Surf Spot Finder, an AI-powered tool designed to help │ │ │ │ users find the best surfing spots based on their location and preferences. It includes a sidebar for │ │ │ │ configuring search parameters and a button to run an agent. The agent uses various tools to gather │ │ │ │ information, and the results are displayed with an evaluation system that allows customization. │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
View the results¶
The agent.run
method returns an AgentTrace object, which has a few convenient attributes for displaying some interesting information about the run.
print(agent_trace.final_output) # Final answer
print(f"Duration: {agent_trace.duration.total_seconds():.2f} seconds")
The provided code is a Streamlit application for the Surf Spot Finder, an AI-powered tool designed to help users find the best surfing spots based on their location and preferences. It includes a sidebar for configuring search parameters and a button to run an agent. The agent uses various tools to gather information, and the results are displayed with an evaluation system that allows customization. Duration: 7.08 seconds