Encoderfile CLI Documentation

Overview

Encoderfile provides two command-line tools:

cli - Rust-based build tool for creating encoderfile binaries from ONNX models
encoderfile - Rust-based runtime binary for serving models and running inference

Build Tool: `cli`

The cli build command compiles HuggingFace transformer models (with ONNX weights) into self-contained executable binaries using a YAML configuration file.

`build`

Validates a model configuration and builds a self-contained Rust binary with embedded model assets.

Usage

# If you haven't installed the CLI tool yet, build it first:
cargo build --bin encoderfile --release

# Then run it:
./target/release/encoderfile build -f <config.yml> [OPTIONS]

# Or install it to your system:
cargo install --path encoderfile --bin encoderfile
encoderfile build -f <config.yml> [OPTIONS]

Options

Option	Short	Type	Required	Description
`--config`	`-f`	Path	Yes	Path to YAML configuration file
`--output-dir`	-	Path	No	Override output directory from config
`--cache-dir`	-	Path	No	Override cache directory from config
`--no-build`	-	Flag	No	Generate project files without building

Configuration File Format

Create a YAML configuration file (e.g., config.yml) with the following structure:

encoderfile:
  # Model identifier (used in API responses)
  name: my-model

  # Model version (optional, defaults to "0.1.0")
  version: "1.0.0"

  # Path to model directory or explicit file paths
  path: ./models/my-model
  # OR specify files explicitly:
  # path:
  #   model_config_path: ./models/config.json
  #   model_weights_path: ./models/model.onnx
  #   tokenizer_path: ./models/tokenizer.json

  # Model type: embedding, sequence_classification, or token_classification
  model_type: embedding

  # Output path (optional, defaults to ./<name>.encoderfile in current directory)
  output_path: ./build/my-model.encoderfile

  # Cache directory (optional, defaults to system cache)
  cache_dir: ~/.cache/encoderfile

  # Optional transform (Lua script for post-processing)
  transform:
    path: ./transforms/normalize.lua
  # OR inline transform:
  # transform: "return normalize(output)"

  # Whether to build the binary (optional, defaults to true)
  build: true

Model Types

embedding - For models using AutoModel or AutoModelForMaskedLM
Outputs: last_hidden_state with shape [batch_size, sequence_length, hidden_size]
sequence_classification - For models using AutoModelForSequenceClassification
Outputs: logits with shape [batch_size, num_labels]
token_classification - For models using AutoModelForTokenClassification
Outputs: logits with shape [batch_size, num_tokens, num_labels]

Examples

Build an embedding model:

Create embedding-config.yml:

encoderfile:
  name: sentence-embedder
  version: "1.0.0"
  path: ./models/all-MiniLM-L6-v2
  model_type: embedding
  output_path: ./build/sentence-embedder.encoderfile

Build:

./target/release/encoderfile build -f embedding-config.yml

Build a sentiment classifier:

Create sentiment-config.yml:

encoderfile:
  name: sentiment-analyzer
  path: ./models/distilbert-sst2
  model_type: sequence_classification

Build:

./target/release/encoderfile build -f sentiment-config.yml

Build a NER model with transform:

Create ner-config.yml:

encoderfile:
  name: ner-tagger
  path: ./models/bert-ner
  model_type: token_classification
  transform:
    path: ./transforms/softmax_logits.lua

Build:

./target/release/encoderfile build -f ner-config.yml

Generate without building:

./target/release/encoderfile build -f config.yml --no-build

Override output directory:

./target/release/encoderfile build -f config.yml --output-dir ./custom-output

Build Process

The build command performs the following steps:

Loads configuration - Parses the YAML config file
Validates model files - Checks for required files:
model.onnx - ONNX model weights (or path specified in config)
tokenizer.json - Tokenizer configuration (or path specified in config)
config.json - Model configuration (or path specified in config)
Validates ONNX model - Checks the ONNX model structure and compatibility
Generates project - Creates a new Rust project in the cache directory with:
main.rs - Generated from Tera templates
Cargo.toml - Generated with proper dependencies
Embeds assets - Uses the factory! macro to embed model files at compile time
Compiles binary - Runs cargo build --release on the generated project
Outputs binary - Copies the binary to the specified output path

Output

Upon successful build, you'll find the binary at the path specified in output_path.

If output_path is not specified, the binary defaults to:

./<name>.encoderfile

For example, with name: my-model and output_path: ./build/my-model.encoderfile:

./build/my-model.encoderfile

This binary is completely self-contained and includes: - ONNX model weights (embedded at compile time) - Tokenizer configuration (embedded) - Model metadata (embedded) - Full inference runtime

Requirements

Before building, ensure you have:

Rust toolchain
protoc Protocol Buffer compiler
Valid ONNX model files

Troubleshooting

Error: "No such file: model.onnx"

Solution: Ensure your model directory contains ONNX weights.
Export with: optimum-cli export onnx --model <model_id> --task <task> <output_dir>

Error: "Could not locate model config at path"

Solution: The model directory is missing required files.
Ensure the directory contains: config.json, tokenizer.json, and model.onnx

Error: "No such directory"

Solution: The path specified in the config file doesn't exist.
Check the path value in your YAML config.

Error: "cargo build failed"

Solution: Check that Rust and required system dependencies are installed.
Run: rustc --version && cargo --version

Error: "Cannot locate cache directory"

Solution: System cannot determine the cache directory.
Specify an explicit cache_dir in your config file.

`version`

Prints the encoderfile version.

Usage

./target/release/encoderfile version

Output

Encoderfile 0.1.0

Runtime Binary: `encoderfile`

After building with the cli tool, the resulting .encoderfile binary provides inference capabilities.

Architecture

The runtime CLI is built with the following components: - Server Mode: Hosts models via HTTP and/or gRPC endpoints - Inference Mode: Performs one-off inference operations from the command line - Multi-Model Support: Automatically detects and routes to the appropriate model type

Commands

`serve`

Starts the encoderfile server with HTTP and/or gRPC endpoints for model inference.

Usage

encoderfile serve [OPTIONS]

Options

Option	Type	Default	Description
`--grpc-hostname`	String	`[::]`	Hostname/IP address for the gRPC server
`--grpc-port`	String	`50051`	Port for the gRPC server
`--http-hostname`	String	`0.0.0.0`	Hostname/IP address for the HTTP server
`--http-port`	String	`8080`	Port for the HTTP server
`--disable-grpc`	Boolean	`false`	Disable the gRPC server
`--disable-http`	Boolean	`false`	Disable the HTTP server

Examples

Start both HTTP and gRPC servers (default):

encoderfile serve

Start only HTTP server:

encoderfile serve --disable-grpc

Start only gRPC server:

encoderfile serve --disable-http

Custom ports and hostnames:

encoderfile serve \
  --http-hostname 127.0.0.1 \
  --http-port 3000 \
  --grpc-hostname localhost \
  --grpc-port 50052

Notes

At least one server type (HTTP or gRPC) must be enabled
The server will display a banner upon successful startup
Both servers run concurrently using async tasks

`infer`

Performs inference on input text using the configured model. The model type is automatically detected based on configuration.

Usage

encoderfile infer <INPUTS>... [OPTIONS]

Arguments

Argument	Required	Description
`<INPUTS>`	Yes	One or more text strings to process

Options

Option	Type	Default	Description
`-f, --format`	Enum	`json`	Output format (currently only JSON supported)
`-o, --out-dir`	String	None	Output file path; if not provided, prints to stdout

Model Types

The inference behavior depends on the model type configured:

1. Embedding Models

Generates vector embeddings for input text.

Example:

encoderfile infer "Hello world" "Another sentence"

With normalization disabled:

encoderfile infer "Hello world" --normalize=false

2. Sequence Classification Models

Classifies entire sequences (e.g., sentiment analysis, topic classification).

Example:

encoderfile infer "This product is amazing!" "I'm very disappointed"

3. Token Classification Models

Labels individual tokens (e.g., Named Entity Recognition, Part-of-Speech tagging).

Example:

encoderfile infer "Apple Inc. is located in Cupertino, California"

Output Formats

Currently, only JSON format is supported (--format json). The output structure varies by model type:

Embedding Output

{
  "embeddings": [
    [0.123, -0.456, 0.789, ...],
    [0.321, -0.654, 0.987, ...]
  ],
  "metadata": null
}

Sequence Classification Output

{
  "predictions": [
    {
      "label": "POSITIVE",
      "score": 0.9876
    },
    {
      "label": "NEGATIVE",
      "score": 0.8765
    }
  ],
  "metadata": null
}

Token Classification Output

{
  "predictions": [
    {
      "tokens": ["Apple", "Inc.", "is", "located", "in", "Cupertino", ",", "California"],
      "labels": ["B-ORG", "I-ORG", "O", "O", "O", "B-LOC", "O", "B-LOC"]
    }
  ],
  "metadata": null
}

Saving Output to File

Save results to a file:

encoderfile infer "Sample text" -o results.json

Process multiple inputs and save:

encoderfile infer "First input" "Second input" "Third input" --out-dir output.json

Configuration

The CLI relies on external configuration to determine: - Model type (Embedding, SequenceClassification, TokenClassification) - Model path and parameters - Server settings

Ensure your configuration is properly set before running commands. Refer to the main encoderfile configuration documentation for details.

Error Handling

The CLI will return appropriate error messages for: - Invalid configuration (e.g., both servers disabled) - Missing required arguments - Model loading failures - Inference errors - File I/O errors

Examples

Basic Inference Workflow

# Set up configuration (example)
export MODEL_PATH=/path/to/model
export MODEL_TYPE=embedding

# Run inference
encoderfile infer "Hello world"

# Save to file
encoderfile infer "Hello world" -o embedding.json

Server Workflow

# Terminal 1: Start server
encoderfile serve --http-port 8080

# Terminal 2: Make HTTP requests (using curl)
curl -X POST http://localhost:8080/infer \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["Hello world"], "normalize": true}'

Batch Processing

# Process multiple inputs at once
encoderfile infer \
  "First document to analyze" \
  "Second document to analyze" \
  "Third document to analyze" \
  --out-dir batch_results.json

Custom Server Configuration

# Run on specific network interface with custom ports
encoderfile serve \
  --http-hostname 192.168.1.100 \
  --http-port 3000 \
  --grpc-hostname 192.168.1.100 \
  --grpc-port 50052

Troubleshooting

Both servers cannot be disabled

Error: "Cannot disable both gRPC and HTTP"

Solution: Enable at least one server type:

encoderfile serve --disable-grpc  # Keep HTTP enabled
# OR
encoderfile serve --disable-http  # Keep gRPC enabled

Output not appearing

If output isn't visible, check: 1. Ensure you're not redirecting output to a file unintentionally 2. Check file permissions if using --out-dir 3. Verify the model is correctly configured

Model type detection

The CLI automatically detects model type from configuration. If inference behaves unexpectedly: 1. Verify your model configuration 2. Ensure the model type matches your use case 3. Check model compatibility

Complete Workflow Example

Here's a complete workflow from model export to deployment:

Step 1: Export Model to ONNX

# Export a HuggingFace model to ONNX format
optimum-cli export onnx \
  --model distilbert-base-uncased-finetuned-sst-2-english \
  --task text-classification \
  ./models/sentiment-model

Step 2: Create Configuration File

Create sentiment-config.yml:

encoderfile:
  name: sentiment-analyzer
  version: "1.0.0"
  path: ./models/sentiment-model
  model_type: sequence_classification
  output_path: ./build/sentiment-analyzer.encoderfile

Step 3: Build Encoderfile Binary

# Build self-contained binary
./target/release/cli build -f sentiment-config.yml

This creates: ./build/sentiment-analyzer.encoderfile

Step 4: Run Inference

Option A: Start server and use HTTP/gRPC

# Start server
./build/sentiment-analyzer.encoderfile serve

# In another terminal - use HTTP API
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["This is amazing!", "This is terrible"]}'

Option B: Direct CLI inference

# Single inference
./build/sentiment-analyzer.encoderfile infer "This is amazing!"

# Batch inference
./build/sentiment-analyzer.encoderfile infer \
  "This is amazing!" \
  "This is terrible" \
  "This is okay" \
  -o results.json

Step 5: Deploy

# Copy binary to deployment location
cp ./build/sentiment-analyzer.encoderfile /usr/local/bin/sentiment-analyzer

# The binary is self-contained - no dependencies needed!
sentiment-analyzer serve --http-port 8080

Command Reference Summary

Command	Tool	Purpose
`./target/release/encoderfile build -f config.yml`	encoderfile	Build self-contained binary from ONNX model
`./target/release/encoderfile version`	encoderfile	Print version information
`<model>.encoderfile serve`	encoderfile	Start HTTP/gRPC inference server
`<model>.encoderfile infer`	encoderfile	Run single inference from command line
`<model>.encoderfile mcp`	encoderfile	Start MCP server

Additional Resources

Getting Started Guide - Step-by-step tutorial
API Reference - HTTP/gRPC/MCP API documentation
BUILDING.md - Complete build guide with advanced configuration
GitHub Repository - Source code and issues

Encoderfile CLI Documentation

Overview

Build Tool: cli

build

Usage

Options

Configuration File Format

Model Types

Examples

Build Process

Output

Requirements

Troubleshooting

version

Usage

Output

Runtime Binary: encoderfile

Architecture

Commands

serve

Usage

Options

Examples

Notes

infer

Usage

Arguments

Options

Model Types

1. Embedding Models

2. Sequence Classification Models

3. Token Classification Models

Output Formats

Embedding Output

Sequence Classification Output

Token Classification Output

Saving Output to File

Configuration

Error Handling

Examples

Basic Inference Workflow

Server Workflow

Batch Processing

Custom Server Configuration

Troubleshooting

Both servers cannot be disabled

Output not appearing

Model type detection

Complete Workflow Example

Step 1: Export Model to ONNX

Step 2: Create Configuration File

Step 3: Build Encoderfile Binary

Step 4: Run Inference

Step 5: Deploy

Command Reference Summary

Additional Resources

Build Tool: `cli`

`build`

`version`

Runtime Binary: `encoderfile`

`serve`

`infer`