Encoderfile

Deploy Encoder Transformers as self-contained, single-binary executables.
Encoderfile packages transformer encoders—and their classification heads—into a single, self-contained executable.
Replace fragile, multi-gigabyte Python containers with lean, auditable binaries that have zero runtime dependencies1. Written in Rust and built on ONNX Runtime, Encoderfile ensures strict determinism and high performance for financial platforms, content moderation pipelines, and search infrastructure.
Why Encoderfile?
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures. It is designed for environments where compliance, latency, and determinism are non-negotiable.
- Zero Dependencies: No Python, no PyTorch, no network calls. Just a fast, portable binary.
- Smaller Footprint: Binaries are measured in megabytes, not the gigabytes required for standard container deployments.
- Protocol Agnostic: Runs as a REST API, gRPC microservice, CLI tool, or MCP Server out of the box.
- Compliance-Friendly: Deterministic and offline-safe, making it ideal for strict security boundaries.
Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.
Use Cases
| Scenario | Application |
|---|---|
| Microservices | Run as a standalone gRPC or REST service on localhost or in production. |
| AI Agents | Register as an MCP Server to give agents reliable classification tools. |
| Batch Jobs | Use the CLI mode (infer) to process text pipelines without spinning up servers. |
| Edge Deployment | Deploy sentiment analysis, NER, or embeddings anywhere without Docker or Python. |
Supported Models
Encoderfile supports encoder-only transformers for:
- Token Embeddings - clustering, embeddings (BERT, DistilBERT, RoBERTa)
- Sequence Classification - Sentiment analysis, topic classification
- Token Classification - Named Entity Recognition, PII detection
- Sentence Embeddings - Semantic search, clustering
See our guide on building from source for detailed instructions on building the CLI tool from source.
Generation models (GPT, T5) are not supported. See CLI Reference for complete model type details.
Quick Start
1. Install CLI
Download the pre-built CLI tool:
Or build from source (see Building Guide).
2. Export Model & Build
Export a HuggingFace model and build it into a binary:
# Export to ONNX
optimum-cli export onnx --model <model-id> --task <task> ./model
# Build the encoderfile
encoderfile build -f config.yml
See the Building Guide for detailed export options and configuration.
3. Run & Test
Start the server and make predictions:
# Start server
./build/sentiment-analyzer.encoderfile serve
# Make a prediction
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"inputs": ["Your text here"]}'
See the API Reference for complete endpoint documentation.
Next Steps: Try the Token Classification Cookbook for a complete walkthrough.
How It Works
Encoderfile compiles your model into a self-contained binary by embedding ONNX weights, tokenizer, and config directly into Rust code. The result is a portable executable with zero runtime dependencies.

Documentation
Getting Started
- Installation & Setup - Complete setup guide from installation to first deployment
- Building Guide - Export models and configure builds
Tutorials
- Token Classification (NER) - Build a Named Entity Recognition system
- Transforms Guide - Custom post-processing with Lua scripts
Reference
- CLI Reference - Full documentation for
build,serve, andinfercommands - API Reference - REST, gRPC, and MCP endpoint specifications
Community & Support
- GitHub Issues - Report bugs or request features
- Contributing Guide - Learn how to contribute
- Code of Conduct - Community guidelines
-
Standard builds of Encoderfile require glibc to run because of the ONNX runtime. See this issue on progress on building Encoderfile for musl linux. ↩