Local Agent¶

(no Google Colab for this one since it is meant to be run entirely locally)

This tutorial will guide on how to run agent, fully locally / offline i.e., running with a local LLM and a couple of local tools (Callable Python functions), so no data will leave your machine!

This can be especially useful for privacy-sensitive applications or when you want to avoid any cloud dependencies.

In this example, we will showcase how to let an agent read and write in your local filesystem! Specifically, we will give read-access to the agent to some files in our codebase and ask it to create a short report of potential vulnerabilities or bugs in the codebase, and write it to a file.

Install Dependencies¶

any-agent uses the python asyncio module to support async functionality. When running in Jupyter notebooks, this means we need to enable the use of nested event loops. We'll install any-agent and enable this below using nest_asyncio.

In [ ]:

Copied!

%pip install 'any-agent' --quiet

import nest_asyncio

nest_asyncio.apply()
%pip install 'any-agent' --quiet

import nest_asyncio

nest_asyncio.apply()

Set up your own LLM locally¶

Regardless of which agent framework you choose in any-agent, all of them support any-llm, which is a proxy that allows us to use whichever LLM inside the framework, hosted on by any provider. For example, we could use a local model via llama.cpp or llamafile, a google hosted gemini model, or a AWS bedrock hosted Llama model. For this example, we will use Ollama to run our LLM locally!

Ollama setup¶

First, install Ollama by following their instructions: https://ollama.com/download

Picking an LLM¶

Pick a model that you can run locally based on your hardware and download it from your terminal. For example:

16-24GB RAM -> granite3.3:8b or gemma3:12b with ~10–20k context length

24+GB RAM -> mistral-small3.2:24b or devstral:24b with ~20k+ context length

NOTE: Smaller models have shown to be more inconsistent and less capable, especially as task complexity increases. If you have the hardware, we highly recommend using the larger models like mistral-small3.2:24b for better performance.

Serving the model with the appropriate context length¶

By default, Ollama forces a context length of 8192 tokens to all models, which is not enough for our agent to work properly. We can simply pass to the AgentConfig of any-agent as a model argument (num_ctx is ollama-specific) our desired value of context length like so:

    AgentConfig(
        model_id="ollama/granite3.3:8b",
        instructions="You must use the available tools to solve the task.",
        tools=[mcp_filesystem, show_plan],
        model_args={"num_ctx": 14000},

References: AgentConfig, num_ctx

All four of the models above have a max context length of 128k tokens, but if you have limited RAM if you set it to 128k it might cause you memory issues. For this example, we will set it to 14.000 tokens and provide a relatively small codebase.

Load the model¶

Firstly, download the Ollama model locally:

ollama pull granite3.3:8b

Configure the Agent and the Tools¶

Pick which tools to use¶

Since we want our agent to work fully locally/offline, we will not add any tools that require communication with remote servers. In this example we are using python callable functions as tools, but we could also have used MCP servers that run fully locally (e.g. mcp/filesystem)

In [ ]:

Copied!





def read_file(file_path: str) -> str:
    """
    Read a file from the file path given and return its content.

    Args:
        file_path: The path of the file to read.
    """
    try:
        with open(file_path) as file:
            content = file.read()
    except Exception as e:
        content = f"Error reading file: {file_path} \n\n {e}"
    return content
def read_file(file_path: str) -> str:
    """
    Read a file from the file path given and return its content.

    Args:
        file_path: The path of the file to read.
    """
    try:
        with open(file_path) as file:
            content = file.read()
    except Exception as e:
        content = f"Error reading file: {file_path} \n\n {e}"
    return content

Now that you have downloaded your LLM and you have defined your tools, you need to pick your agent framework to build your agent. Note that the agent you'll build with any-agent can be run across multiple agent frameworks (Smolagent, TinyAgent, OpenAI, etc) and across various LLMs (Llama, DeepSeek, Mistral, etc). For this example, we will use the smolagents framework.

In [ ]:

Copied!





import os

from any_agent import AgentConfig, AnyAgent

if os.getenv("IN_PYTEST") == "1":
    # Don't need to worry about this when running locally.
    # We use this for out automated testing https://github.com/mozilla-ai/any-agent/blob/main/.github/workflows/tests-cookbook.yaml
    # HF_ENDPOINT points to a Qwen/Qwen3-1.7B running in https://endpoints.huggingface.co/
    model_id = "huggingface/tgi"
    model_args = {"api_base": os.environ["HF_ENDPOINT"]}
else:
    model_id = "ollama/granite3.3:8b"
    model_args = {"num_ctx": 32000}


agent = AnyAgent.create(
    "tinyagent",
    AgentConfig(
        model_id=model_id,
        instructions="You are an expert summarizer.",
        tools=[read_file],
        model_args=model_args,
    ),
)
import os

from any_agent import AgentConfig, AnyAgent

if os.getenv("IN_PYTEST") == "1":
    # Don't need to worry about this when running locally.
    # We use this for out automated testing https://github.com/mozilla-ai/any-agent/blob/main/.github/workflows/tests-cookbook.yaml
    # HF_ENDPOINT points to a Qwen/Qwen3-1.7B running in https://endpoints.huggingface.co/
    model_id = "huggingface/tgi"
    model_args = {"api_base": os.environ["HF_ENDPOINT"]}
else:
    model_id = "ollama/granite3.3:8b"
    model_args = {"num_ctx": 32000}


agent = AnyAgent.create(
    "tinyagent",
    AgentConfig(
        model_id=model_id,
        instructions="You are an expert summarizer.",
        tools=[read_file],
        model_args=model_args,
    ),
)

Run the Agent¶

In [ ]:

Copied!





from pathlib import Path

abs_path = Path("../../demo/app.py").resolve()
agent_trace = agent.run(
    f"Carefully read the file in the path: {abs_path} and return summary of what the code does."
)
from pathlib import Path

abs_path = Path("../../demo/app.py").resolve()
agent_trace = agent.run(
    f"Carefully read the file in the path: {abs_path} and return summary of what the code does."
)

View the results¶

The agent.run method returns an AgentTrace object, which has a few convenient attributes for displaying some interesting information about the run.

In [ ]:

Copied!

print(agent_trace.final_output)  # Final answer
print(f"Duration: {agent_trace.duration.total_seconds():.2f} seconds")
print(agent_trace.final_output)  # Final answer
print(f"Duration: {agent_trace.duration.total_seconds():.2f} seconds")

In [ ]: