Encoderfile CLI Documentation
Overview
Encoderfile provides two command-line tools:
cli- Rust-based build tool for creating encoderfile binaries from ONNX modelsencoderfile- Rust-based runtime binary for serving models and running inference
Build Tool: cli
The cli build command compiles HuggingFace transformer models (with ONNX weights) into self-contained executable binaries using a YAML configuration file.
build
Validates a model configuration and builds a self-contained Rust binary with embedded model assets.
Usage
# If you haven't installed the CLI tool yet, build it first:
cargo build --bin encoderfile --release
# Then run it:
./target/release/encoderfile build -f <config.yml> [OPTIONS]
# Or install it to your system:
cargo install --path encoderfile --bin encoderfile
encoderfile build -f <config.yml> [OPTIONS]
Options
| Option | Short | Type | Required | Description |
|---|---|---|---|---|
--config |
-f |
Path | Yes | Path to YAML configuration file |
--output-dir |
- | Path | No | Override output directory from config |
--cache-dir |
- | Path | No | Override cache directory from config |
--no-build |
- | Flag | No | Generate project files without building |
Configuration File Format
Create a YAML configuration file (e.g., config.yml) with the following structure:
encoderfile:
# Model identifier (used in API responses)
name: my-model
# Model version (optional, defaults to "0.1.0")
version: "1.0.0"
# Path to model directory or explicit file paths
path: ./models/my-model
# OR specify files explicitly:
# path:
# model_config_path: ./models/config.json
# model_weights_path: ./models/model.onnx
# tokenizer_path: ./models/tokenizer.json
# Model type: embedding, sequence_classification, or token_classification
model_type: embedding
# Output path (optional, defaults to ./<name>.encoderfile in current directory)
output_path: ./build/my-model.encoderfile
# Cache directory (optional, defaults to system cache)
cache_dir: ~/.cache/encoderfile
# Optional transform (Lua script for post-processing)
transform:
path: ./transforms/normalize.lua
# OR inline transform:
# transform: "return normalize(output)"
# Whether to build the binary (optional, defaults to true)
build: true
Model Types
embedding- For models usingAutoModelorAutoModelForMaskedLM- Outputs:
last_hidden_statewith shape[batch_size, sequence_length, hidden_size] sequence_classification- For models usingAutoModelForSequenceClassification- Outputs:
logitswith shape[batch_size, num_labels] token_classification- For models usingAutoModelForTokenClassification- Outputs:
logitswith shape[batch_size, num_tokens, num_labels]
Examples
Build an embedding model:
Create embedding-config.yml:
encoderfile:
name: sentence-embedder
version: "1.0.0"
path: ./models/all-MiniLM-L6-v2
model_type: embedding
output_path: ./build/sentence-embedder.encoderfile
Build:
Build a sentiment classifier:
Create sentiment-config.yml:
encoderfile:
name: sentiment-analyzer
path: ./models/distilbert-sst2
model_type: sequence_classification
Build:
Build a NER model with transform:
Create ner-config.yml:
encoderfile:
name: ner-tagger
path: ./models/bert-ner
model_type: token_classification
transform:
path: ./transforms/softmax_logits.lua
Build:
Generate without building:
Override output directory:
Build Process
The build command performs the following steps:
- Loads configuration - Parses the YAML config file
- Validates model files - Checks for required files:
model.onnx- ONNX model weights (or path specified in config)tokenizer.json- Tokenizer configuration (or path specified in config)config.json- Model configuration (or path specified in config)- Validates ONNX model - Checks the ONNX model structure and compatibility
- Generates project - Creates a new Rust project in the cache directory with:
main.rs- Generated from Tera templatesCargo.toml- Generated with proper dependencies- Embeds assets - Uses the
factory!macro to embed model files at compile time - Compiles binary - Runs
cargo build --releaseon the generated project - Outputs binary - Copies the binary to the specified output path
Output
Upon successful build, you'll find the binary at the path specified in output_path.
If output_path is not specified, the binary defaults to:
For example, with name: my-model and output_path: ./build/my-model.encoderfile:
This binary is completely self-contained and includes: - ONNX model weights (embedded at compile time) - Tokenizer configuration (embedded) - Model metadata (embedded) - Full inference runtime
Requirements
Before building, ensure you have:
Troubleshooting
Error: "No such file: model.onnx"
Solution: Ensure your model directory contains ONNX weights.
Export with: optimum-cli export onnx --model <model_id> --task <task> <output_dir>
Error: "Could not locate model config at path"
Solution: The model directory is missing required files.
Ensure the directory contains: config.json, tokenizer.json, and model.onnx
Error: "No such directory"
Solution: The path specified in the config file doesn't exist.
Check the path value in your YAML config.
Error: "cargo build failed"
Solution: Check that Rust and required system dependencies are installed.
Run: rustc --version && cargo --version
Error: "Cannot locate cache directory"
Solution: System cannot determine the cache directory.
Specify an explicit cache_dir in your config file.
version
Prints the encoderfile version.
Usage
Output
Runtime Binary: encoderfile
After building with the cli tool, the resulting .encoderfile binary provides inference capabilities.
Architecture
The runtime CLI is built with the following components: - Server Mode: Hosts models via HTTP and/or gRPC endpoints - Inference Mode: Performs one-off inference operations from the command line - Multi-Model Support: Automatically detects and routes to the appropriate model type
Commands
serve
Starts the encoderfile server with HTTP and/or gRPC endpoints for model inference.
Usage
Options
| Option | Type | Default | Description |
|---|---|---|---|
--grpc-hostname |
String | [::] |
Hostname/IP address for the gRPC server |
--grpc-port |
String | 50051 |
Port for the gRPC server |
--http-hostname |
String | 0.0.0.0 |
Hostname/IP address for the HTTP server |
--http-port |
String | 8080 |
Port for the HTTP server |
--disable-grpc |
Boolean | false |
Disable the gRPC server |
--disable-http |
Boolean | false |
Disable the HTTP server |
Examples
Start both HTTP and gRPC servers (default):
Start only HTTP server:
Start only gRPC server:
Custom ports and hostnames:
encoderfile serve \
--http-hostname 127.0.0.1 \
--http-port 3000 \
--grpc-hostname localhost \
--grpc-port 50052
Notes
- At least one server type (HTTP or gRPC) must be enabled
- The server will display a banner upon successful startup
- Both servers run concurrently using async tasks
infer
Performs inference on input text using the configured model. The model type is automatically detected based on configuration.
Usage
Arguments
| Argument | Required | Description |
|---|---|---|
<INPUTS> |
Yes | One or more text strings to process |
Options
| Option | Type | Default | Description |
|---|---|---|---|
-f, --format |
Enum | json |
Output format (currently only JSON supported) |
-o, --out-dir |
String | None | Output file path; if not provided, prints to stdout |
Model Types
The inference behavior depends on the model type configured:
1. Embedding Models
Generates vector embeddings for input text.
Example:
With normalization disabled:
2. Sequence Classification Models
Classifies entire sequences (e.g., sentiment analysis, topic classification).
Example:
3. Token Classification Models
Labels individual tokens (e.g., Named Entity Recognition, Part-of-Speech tagging).
Example:
Output Formats
Currently, only JSON format is supported (--format json). The output structure varies by model type:
Embedding Output
Sequence Classification Output
{
"predictions": [
{
"label": "POSITIVE",
"score": 0.9876
},
{
"label": "NEGATIVE",
"score": 0.8765
}
],
"metadata": null
}
Token Classification Output
{
"predictions": [
{
"tokens": ["Apple", "Inc.", "is", "located", "in", "Cupertino", ",", "California"],
"labels": ["B-ORG", "I-ORG", "O", "O", "O", "B-LOC", "O", "B-LOC"]
}
],
"metadata": null
}
Saving Output to File
Save results to a file:
Process multiple inputs and save:
Configuration
The CLI relies on external configuration to determine: - Model type (Embedding, SequenceClassification, TokenClassification) - Model path and parameters - Server settings
Ensure your configuration is properly set before running commands. Refer to the main encoderfile configuration documentation for details.
Error Handling
The CLI will return appropriate error messages for: - Invalid configuration (e.g., both servers disabled) - Missing required arguments - Model loading failures - Inference errors - File I/O errors
Examples
Basic Inference Workflow
# Set up configuration (example)
export MODEL_PATH=/path/to/model
export MODEL_TYPE=embedding
# Run inference
encoderfile infer "Hello world"
# Save to file
encoderfile infer "Hello world" -o embedding.json
Server Workflow
# Terminal 1: Start server
encoderfile serve --http-port 8080
# Terminal 2: Make HTTP requests (using curl)
curl -X POST http://localhost:8080/infer \
-H "Content-Type: application/json" \
-d '{"inputs": ["Hello world"], "normalize": true}'
Batch Processing
# Process multiple inputs at once
encoderfile infer \
"First document to analyze" \
"Second document to analyze" \
"Third document to analyze" \
--out-dir batch_results.json
Custom Server Configuration
# Run on specific network interface with custom ports
encoderfile serve \
--http-hostname 192.168.1.100 \
--http-port 3000 \
--grpc-hostname 192.168.1.100 \
--grpc-port 50052
Troubleshooting
Both servers cannot be disabled
Error: "Cannot disable both gRPC and HTTP"
Solution: Enable at least one server type:
encoderfile serve --disable-grpc # Keep HTTP enabled
# OR
encoderfile serve --disable-http # Keep gRPC enabled
Output not appearing
If output isn't visible, check:
1. Ensure you're not redirecting output to a file unintentionally
2. Check file permissions if using --out-dir
3. Verify the model is correctly configured
Model type detection
The CLI automatically detects model type from configuration. If inference behaves unexpectedly: 1. Verify your model configuration 2. Ensure the model type matches your use case 3. Check model compatibility
Complete Workflow Example
Here's a complete workflow from model export to deployment:
Step 1: Export Model to ONNX
# Export a HuggingFace model to ONNX format
optimum-cli export onnx \
--model distilbert-base-uncased-finetuned-sst-2-english \
--task text-classification \
./models/sentiment-model
Step 2: Create Configuration File
Create sentiment-config.yml:
encoderfile:
name: sentiment-analyzer
version: "1.0.0"
path: ./models/sentiment-model
model_type: sequence_classification
output_path: ./build/sentiment-analyzer.encoderfile
Step 3: Build Encoderfile Binary
This creates: ./build/sentiment-analyzer.encoderfile
Step 4: Run Inference
Option A: Start server and use HTTP/gRPC
# Start server
./build/sentiment-analyzer.encoderfile serve
# In another terminal - use HTTP API
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"inputs": ["This is amazing!", "This is terrible"]}'
Option B: Direct CLI inference
# Single inference
./build/sentiment-analyzer.encoderfile infer "This is amazing!"
# Batch inference
./build/sentiment-analyzer.encoderfile infer \
"This is amazing!" \
"This is terrible" \
"This is okay" \
-o results.json
Step 5: Deploy
# Copy binary to deployment location
cp ./build/sentiment-analyzer.encoderfile /usr/local/bin/sentiment-analyzer
# The binary is self-contained - no dependencies needed!
sentiment-analyzer serve --http-port 8080
Command Reference Summary
| Command | Tool | Purpose |
|---|---|---|
./target/release/encoderfile build -f config.yml |
encoderfile | Build self-contained binary from ONNX model |
./target/release/encoderfile version |
encoderfile | Print version information |
<model>.encoderfile serve |
encoderfile | Start HTTP/gRPC inference server |
<model>.encoderfile infer |
encoderfile | Run single inference from command line |
<model>.encoderfile mcp |
encoderfile | Start MCP server |
Additional Resources
- Getting Started Guide - Step-by-step tutorial
- API Reference - HTTP/gRPC/MCP API documentation
- BUILDING.md - Complete build guide with advanced configuration
- GitHub Repository - Source code and issues