Skip to content

Encoderfile

Encoderfile

Deploy Encoder Transformers as self-contained, single-binary executables.


Encoderfile packages transformer encoders—and their classification heads—into a single, self-contained executable.

Replace fragile, multi-gigabyte Python containers with lean, auditable binaries that have zero runtime dependencies1. Written in Rust and built on ONNX Runtime, Encoderfile ensures strict determinism and high performance for financial platforms, content moderation pipelines, and search infrastructure.

Why Encoderfile?

While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures. It is designed for environments where compliance, latency, and determinism are non-negotiable.

  • Zero Dependencies: No Python, no PyTorch, no network calls. Just a fast, portable binary.
  • Smaller Footprint: Binaries are measured in megabytes, not the gigabytes required for standard container deployments.
  • Protocol Agnostic: Runs as a REST API, gRPC microservice, CLI tool, or MCP Server out of the box.
  • Compliance-Friendly: Deterministic and offline-safe, making it ideal for strict security boundaries.

Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.

Use Cases

Scenario Application
Microservices Run as a standalone gRPC or REST service on localhost or in production.
AI Agents Register as an MCP Server to give agents reliable classification tools.
Batch Jobs Use the CLI mode (infer) to process text pipelines without spinning up servers.
Edge Deployment Deploy sentiment analysis, NER, or embeddings anywhere without Docker or Python.

Supported Models

Encoderfile supports encoder-only transformers for:

  • Token Embeddings - clustering, embeddings (BERT, DistilBERT, RoBERTa)
  • Sequence Classification - Sentiment analysis, topic classification
  • Token Classification - Named Entity Recognition, PII detection
  • Sentence Embeddings - Semantic search, clustering

See our guide on building from source for detailed instructions on building the CLI tool from source.

Generation models (GPT, T5) are not supported. See CLI Reference for complete model type details.

Quick Start

1. Install CLI

Download the pre-built CLI tool:

curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | sh

Or build from source (see Building Guide).

2. Export Model & Build

Export a HuggingFace model and build it into a binary:

# Export to ONNX
optimum-cli export onnx --model <model-id> --task <task> ./model

# Build the encoderfile
encoderfile build -f config.yml

See the Building Guide for detailed export options and configuration.

3. Run & Test

Start the server and make predictions:

# Start server
./build/sentiment-analyzer.encoderfile serve

# Make a prediction
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["Your text here"]}'

See the API Reference for complete endpoint documentation.

Next Steps: Try the Token Classification Cookbook for a complete walkthrough.

How It Works

Encoderfile compiles your model into a self-contained binary by embedding ONNX weights, tokenizer, and config directly into Rust code. The result is a portable executable with zero runtime dependencies.

Encoderfile architecture diagram illustrating the build process: compiling ONNX models, tokenizers, and configs into a single binary executable that runs as a zero-dependency gRPC, HTTP, or MCP server.

Documentation

Getting Started

Tutorials

Reference

  • CLI Reference - Full documentation for build, serve, and infer commands
  • API Reference - REST, gRPC, and MCP endpoint specifications

Community & Support


  1. Standard builds of Encoderfile require glibc to run because of the ONNX runtime. See this issue on progress on building Encoderfile for musl linux.