Encoderfile

Deploy Encoder Transformers as self-contained, single-binary executables.

Encoderfile packages transformer encoders—and their classification heads—into a single, self-contained executable.

Replace fragile, multi-gigabyte Python containers with lean, auditable binaries that have zero runtime dependencies¹. Written in Rust and built on ONNX Runtime, Encoderfile ensures strict determinism and high performance for financial platforms, content moderation pipelines, and search infrastructure.

Why Encoderfile?

While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures. It is designed for environments where compliance, latency, and determinism are non-negotiable.

Zero Dependencies: No Python, no PyTorch, no network calls. Just a fast, portable binary.
Smaller Footprint: Binaries are measured in megabytes, not the gigabytes required for standard container deployments.
Protocol Agnostic: Runs as a REST API, gRPC microservice, CLI tool, or MCP Server out of the box.
Compliance-Friendly: Deterministic and offline-safe, making it ideal for strict security boundaries.

Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.

Use Cases

Scenario	Application
Microservices	Run as a standalone gRPC or REST service on localhost or in production.
AI Agents	Register as an MCP Server to give agents reliable classification tools.
Batch Jobs	Use the CLI mode (infer) to process text pipelines without spinning up servers.
Edge Deployment	Deploy sentiment analysis, NER, or embeddings anywhere without Docker or Python.

Supported Models

Encoderfile supports encoder-only transformers for:

Token Embeddings - clustering, embeddings (BERT, DistilBERT, RoBERTa)
Sequence Classification - Sentiment analysis, topic classification
Token Classification - Named Entity Recognition, PII detection
Sentence Embeddings - Semantic search, clustering

See our guide on building from source for detailed instructions on building the CLI tool from source.

Generation models (GPT, T5) are not supported. See CLI Reference for complete model type details.

Quick Start

1. Install CLI

Download the pre-built CLI tool:

curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | sh

Or build from source (see Building Guide).

2. Export Model & Build

Export a HuggingFace model and build it into a binary:

# Export to ONNX
optimum-cli export onnx --model <model-id> --task <task> ./model

# Build the encoderfile
encoderfile build -f config.yml

See the Building Guide for detailed export options and configuration.

3. Run & Test

Start the server and make predictions:

# Start server
./build/sentiment-analyzer.encoderfile serve

# Make a prediction
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["Your text here"]}'

See the API Reference for complete endpoint documentation.

Next Steps: Try the Token Classification Cookbook for a complete walkthrough.

How It Works

Encoderfile compiles your model into a self-contained binary by embedding ONNX weights, tokenizer, and config directly into Rust code. The result is a portable executable with zero runtime dependencies.

Encoderfile architecture diagram illustrating the build process: compiling ONNX models, tokenizers, and configs into a single binary executable that runs as a zero-dependency gRPC, HTTP, or MCP server.

Documentation

Getting Started

Installation & Setup - Complete setup guide from installation to first deployment
Building Guide - Export models and configure builds

Tutorials

Token Classification (NER) - Build a Named Entity Recognition system
Transforms Guide - Custom post-processing with Lua scripts

Reference

CLI Reference - Full documentation for build, serve, and infer commands
API Reference - REST, gRPC, and MCP endpoint specifications

Community & Support

GitHub Issues - Report bugs or request features
Contributing Guide - Learn how to contribute
Code of Conduct - Community guidelines

Standard builds of Encoderfile require glibc to run because of the ONNX runtime. See this issue on progress on building Encoderfile for musl linux. ↩