Using `encoderfile` from agents

The Model Context Protocol (MCP) introduced by Anthropic has proven to be a popular method for providing an AI agent with access to a variety of tools. This Huggingface blog post has a nice explanation of MCP.

In the following example we will use Mozilla's own any-agent and any-llm packages to build a small agent that leverages on capabilities provided by a test encoderfile.

Build the custom encoderfile and start the server

We will use the existing test config to build an encoderfile using one of the test models by Mozilla.ai. It will detect Personally Identifiable Information (PII) and tag it accordingly, using tags like B-SURNAME for, well, surnames, and O for non-PII tokens. As we will see, even if the output consists of logits and tags, the underlying LLM is usually robust enough to focus only on the tags and act appropriately.

curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | sh
encoderfile build -f test_config.yml

After building it, we only need to set it up in MCP mode so it will listen to requests. By default it will bind to all interfaces, using port 9100.

my-model-2.encoderfile mcp

Install Dependencies

For this test, we will need the any-agent and any-llm Python packages:

pip install any-agent
pip install any-llm-sdk[mistral]

Write the agent

Now we will write an agent with the appropriate prompt. We instruct the agent to use the provided tool, since the current description is fairly generic, and not use metadata that it might consider useful but is not documented anywhere in the tool itself. We will also instruct it to replace only surnames to showcase that the tags can be extracted appropriately:

import os
import shutil
from getpass import getpass

if "MISTRAL_API_KEY" not in os.environ:
    print("MISTRAL_API_KEY not found in environment!")
    api_key = getpass("Please enter your MISTRAL_API_KEY: ")
    os.environ["MISTRAL_API_KEY"] = api_key
    print("MISTRAL_API_KEY set for this session!")
else:
    print("MISTRAL_API_KEY found in environment.")

# Quick Environment Check (Airbnb tool requires npx/Node.js)\\n",
if not shutil.which("npx"):
    print(
        "⚠️ Warning: 'npx' was not found in your path. The Airbnb tool requires Node.js/npm to run."
    )


from any_agent import AgentConfig, AnyAgent
from any_agent.config import MCPStreamableHttp


async def send_message(message: str) -> str:
    """Display a message to the user and wait for their response.

    Args:
        message: str
            The message to be displayed to the user.

    Returns:
        str: The response from the user.

    """
    if os.environ.get("IN_PYTEST") == "1":
        return "2 people, next weekend, low budget. Do not ask for any more information or confirmation."
    return input(message + " ")


async def main():
    print("Start creating agent")
    eftool = MCPStreamableHttp(url="http://localhost:9100/mcp")
    try:
        agent = await AnyAgent.create_async(
            "tinyagent",  # See all options in https://mozilla-ai.github.io/any-agent/
            AgentConfig(model_id="mistral:mistral-large-latest", tools=[eftool]),
        )
    except Exception as e:
        print(f"❌ Failed to create agent: {e}")
    print("Done creating agent")

    prompt = """
    Use the eftool tool to remove the personal information from this line: "My name is Javier Torres".
    Do not use any metadata. The "inputs" param must be a sequence with one string.
    Replace each surname, but not given names, with [REDACTED].
    """

    agent_trace = await agent.run_async(prompt)
    print(agent_trace.final_output)
    await agent.cleanup_async()


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

After some struggling with the call conventions, the LLM finally obtains the information from the encoderfile and acts accordingly:

My name is Javier [REDACTED]

Using encoderfile from agents

Build the custom encoderfile and start the server

Install Dependencies

Write the agent

Using `encoderfile` from agents