Getting Started

This quick-start guide will help you build and run your first encoderfile in under 10 minutes.

Prerequisites

Encoderfile CLI Tool

You need the encoderfile CLI tool installed:

Pre-built binary (Linux/macOS): Download from releases
Build from source (all platforms): See BUILDING.md

Python with Optimum

For exporting models to ONNX:

pip install optimum[exporters]

Your First Encoderfile

Let's build a sentiment analysis model as an example.

Step 1: Export Model to ONNX

Export a HuggingFace model to ONNX format:

optimum-cli export onnx \
  --model distilbert-base-uncased-finetuned-sst-2-english \
  --task text-classification \
  ./sentiment-model

This creates a directory with the required files:

sentiment-model/
├── config.json
├── model.onnx                # ONNX weights
├── tokenizer.json            # Tokenizer
└── ... (other files)

Available task types: - feature-extraction - For embedding models - text-classification - For sequence classification - token-classification - For NER/token tagging

Step 2: Create Configuration File

Create sentiment-config.yml:

encoderfile:
  name: sentiment-analyzer
  version: "1.0.0"
  path: ./sentiment-model
  model_type: sequence_classification
  output_path: ./build/sentiment-analyzer.encoderfile

Key fields: - name - Model identifier (used in API responses) - path - Path to the model directory with ONNX weights - model_type - embedding, sequence_classification, or token_classification - output_path - Where to output the binary (optional, defaults to ./<name>.encoderfile)

Step 3: Build the Binary

Build your encoderfile:

encoderfile build -f sentiment-config.yml

Note: If you built the CLI from source, use: ./target/release/encoderfile build -f sentiment-config.yml

The binary will be created at ./build/sentiment-analyzer.encoderfile.

Step 4: Run the Server

Start your encoderfile server:

chmod +x ./build/sentiment-analyzer.encoderfile
./build/sentiment-analyzer.encoderfile serve

You should see:

Starting HTTP server on 0.0.0.0:8080
Starting gRPC server on [::]:50051

Step 5: Make Predictions

Test with curl:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
      "This product is amazing!",
      "Terrible experience, very disappointed"
    ]
  }'

Expected response:

{
  "results": [
    {
      "logits": [-4.123, 4.567],
      "scores": [0.0001, 0.9999],
      "predicted_index": 1,
      "predicted_label": "POSITIVE"
    },
    {
      "logits": [4.234, -3.987],
      "scores": [0.9998, 0.0002],
      "predicted_index": 0,
      "predicted_label": "NEGATIVE"
    }
  ],
  "model_id": "sentiment-analyzer"
}

Quick Examples

Embedding Model

# Export
optimum-cli export onnx \
  --model sentence-transformers/all-MiniLM-L6-v2 \
  --task feature-extraction \
  ./embedding-model

# Config
cat > embedding-config.yml <<EOF
encoderfile:
  name: embedder
  path: ./embedding-model
  model_type: embedding
  output_path: ./build/embedder.encoderfile
EOF

# Build
encoderfile build -f embedding-config.yml

# Run
./build/embedder.encoderfile serve

Token Classification (NER)

# Export
optimum-cli export onnx \
  --model dslim/bert-base-NER \
  --task token-classification \
  ./ner-model

# Config
cat > ner-config.yml <<EOF
encoderfile:
  name: ner
  path: ./ner-model
  model_type: token_classification
  output_path: ./build/ner.encoderfile
EOF

# Build
encoderfile build -f ner-config.yml

# Run
./build/ner.encoderfile serve

Common Tasks

Server Configuration

Custom ports:

./build/my-model.encoderfile serve --http-port 3000 --grpc-port 50052

HTTP only (disable gRPC):

./build/my-model.encoderfile serve --disable-grpc

gRPC only (disable HTTP):

./build/my-model.encoderfile serve --disable-http

CLI Inference

Run inference without starting a server:

# Single input
./build/my-model.encoderfile infer "Test sentence"

# Multiple inputs
./build/my-model.encoderfile infer "First" "Second" "Third"

# Save to file
./build/my-model.encoderfile infer "Test" -o results.json

Using Pre-Exported Models

Some HuggingFace models already have ONNX weights:

# Clone model with existing ONNX weights
git clone https://huggingface.co/optimum/distilbert-base-uncased-finetuned-sst-2-english

# Build directly
cat > config.yml <<EOF
encoderfile:
  name: sentiment
  path: ./distilbert-base-uncased-finetuned-sst-2-english
  model_type: sequence_classification
  output_path: ./build/sentiment.encoderfile
EOF

encoderfile build -f config.yml

Troubleshooting

ONNX Export Fails

Check model compatibility (must be encoder-only)
Try a different task type
Check the model's HuggingFace page for known issues

Build Fails

Ensure the model directory has model.onnx, tokenizer.json, and config.json
Verify the model type matches the architecture
See BUILDING.md for detailed troubleshooting

Server Won't Start

Check if ports are already in use
Try different ports with --http-port and --grpc-port
Check file permissions: chmod +x ./build/my-model.encoderfile

Inference Errors

Check input format matches the expected schema
Verify the server is running
Check server logs for error messages

Next Steps

BUILDING.md - Complete build guide with advanced configuration options
CLI Reference - Full command-line documentation
API Reference - REST, gRPC, and MCP API documentation
Contributing - Help improve encoderfile