# Quickstart You can deploy Lumigator either locally or into a distributed environment [using Kubernetes](../operations-guide/kubernetes-installation.md). In this quickstart guide, we'll start Lumigator locally and evaluate a model on a dataset that we upload. Looking to develop Lumigator? If so, you'll be interested in the [local development guide](../operations-guide/local-development.md) for details of running in development mode. If you hit any issues running the quickstart, we want to hear about it! You can submit an issue [here](https://github.com/mozilla-ai/lumigator/issues). ## Prerequisites Lumigator is currently supported on Linux and Mac. Windows is not yet tested, but we welcome contributions, see {{ '[Contributing Guide](https://github.com/mozilla-ai/lumigator/blob/{}/CONTRIBUTING.md)'.format(commit_id) }}. Before you start, make sure you have the following: - A working installation of [Docker](https://docs.docker.com/engine/install/) - On Mac, Docker Desktop >= 4.37, and `docker-compose` >= 2.31. - On Linux, please also complete the [post-installation steps](https://docs.docker.com/engine/install/linux-postinstall/). - The directory `$HOME/.cache/huggingface/` must exist and be readable and writeable. Lumigator uses this directory for accessing cached huggingface hub models. - If you want to evaluate against hosted API-based LLM services such as the platforms provided by OpenAI, Mistral, or Deepseek, you will need to set the appropriate API key setting as a secret in Lumigator. Refer to [API settings configuration](../operations-guide/configuration.md#api-settings) for more details. - If your system has an NVIDIA GPU, you need to have [installed the NVIDIA Container Toolkit following their instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). After that, open a terminal and run: ```bash export RAY_WORKER_GPUS=1 export RAY_WORKER_GPUS_FRACTION=1.0 export GPU_COUNT=1 ``` **Important: Run the next deployment steps in this same terminal, or thes env vars must be set in your shell configuration** ## Local Deployment Lumigator is run locally using `docker-compose`. In order to deploy the latest release of Lumigator: 1. Clone the Lumigator repository: ```console user@host:~$ git clone git@github.com:mozilla-ai/lumigator.git ``` 1. Change to the Lumigator directory: ```console user@host:~$ cd lumigator ``` 1. Run the `start-lumigator` make target: ```console user@host:~/lumigator$ make start-lumigator ``` This will run all of the components needed for Lumigator. This creates multiple container services networked together to make up all the components of the Lumigator application: - `minio`: Local storage for datasets that mimics S3-API compatible functionality. - `backend`: Lumigator’s FastAPI REST API. Access the Swagger HTTP Docs at http://localhost:8000/docs - `ray`: A Ray cluster for submitting several types of jobs. Access the Ray dashboard at http://localhost:8265 - `mlflow`: Used to track experiments and metrics, accessible at http://localhost:8001 - `frontend`: Lumigator's Web UI, accessible at http://localhost:80 ### Verify To verify that Lumigator is running, open a browser and navigate to `http://localhost:8000`. You should receive the following JSON response: ```json {"Hello": "Lumigator!🐊"} ``` ## Using Lumigator Now that Lumigator is deployed, we can use it to compare a few models. In this guide, we'll evaluate GPT-4o for a few samples of the [dialogsum](https://github.com/cylnlp/dialogsum) dataset that we store {{ '[here](https://github.com/mozilla-ai/lumigator/blob/{}/lumigator/sample_data/summarization/dialogsum_exc.csv)'.format(commit_id) }}, on the task of summarization. Lumigator also supports evaluation on translation tasks; refer to the [translation evaluation guide](../user-guides/translation-eval.md) for details. We will show how to do this using either cURL or the Lumigator SDK. See [the UI guide](ui-guide.md) for information about how to use the UI. The steps are as follows: 1. Upload the dialogsum dataset to Lumigator 1. Create an experiment, which is a container for running the workflow for each model 1. Run the summarization workflow for the model 1. Retrieve the results of the workflow ### Upload a Dataset To upload a dataset, you need to send a POST request to the `/datasets` endpoint. The request should include the dataset file. To run this example, first `cd` to the `lumigator` directory. Then run ::::{tab-set} :::{tab-item} cURL :sync: tab1 ```console user@host:~/lumigator$ export DATASET_PATH=lumigator/sample_data/summarization/dialogsum_exc.csv user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/datasets/ \ -H 'Accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'dataset=@'$DATASET_PATH';type=text/csv' \ -F 'format=job' | jq { "id": "9ac42307-e0e5-4635-a9ce-48acdb451742", "filename": "dialogsum_exc.csv", "format": "job", "size": 3603, "ground_truth": true, "run_id": null, "generated": false, "generated_by": null, "created_at": "2025-02-19T20:00:01" } ``` ::: :::{tab-item} Python SDK :sync: tab2 ```python from lumigator_sdk.lumigator import LumigatorClient from lumigator_schemas.datasets import DatasetFormat dataset_path = 'lumigator/sample_data/summarization/dialogsum_exc.csv' client = LumigatorClient('localhost:8000') response = client.datasets.create_dataset( open(dataset_path, 'rb'), DatasetFormat.JOB ) ``` ::: :::: ```{note} The dataset file should be in CSV format and contain a header row with the following columns: `examples`, `ground_truth`. The `ground_truth` column is optional since you can generate it using Lumigator. See {{ '[here](https://github.com/mozilla-ai/lumigator/blob/{}/lumigator/sample_data/summarization/dialogsum_exc.csv#L4)'.format(commit_id) }} for an example. ``` You can verify that the dataset was uploaded successfully by asking the API to list all datasets and checking that the uploaded dataset is in the list: ::::{tab-set} :::{tab-item} cURL :sync: tab1 ```console user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/datasets/ | jq -r '.items | .[] | .filename' dialogsum_exc.csv ``` ::: :::{tab-item} Python SDK :sync: tab2 ```python datasets = client.datasets.get_datasets() print(datasets.items[-1].filename) ``` ::: :::: ## Create an Experiment Now that you have uploaded a dataset, you can create an experiment. An experiment is a container for running evaluations of models with the dataset. To this end, you need to send a POST request to the `/experiments` endpoint. The request should include the following required fields: - A name for the experiment job. - A short description. - The ID of the dataset you want to use for evaluations. - The task definition, which is `summarization` for this example. Here is an example of how to create an experiment: ```{note} The steps assume you only have uploaded a single dataset. If you have multiple datasets uploaded, this command will use the latest one. If you want a different dataset, replace the `"$(curl -s http://localhost:8000/api/v1/datasets/ | jq -r '.items | .[0].id')"` code with the ID of the dataset you'd like to use. ``` ::::{tab-set} :::{tab-item} cURL :sync: tab1 Set the following variables: ```console user@host:~/lumigator$ export EXP_NAME="DialogSum Summarization" \ EXP_DESC="See which model best summarizes Dialogues." \ EXP_DATASET="$(curl -s http://localhost:8000/api/v1/datasets/ | jq -r '.items | .[0].id')" \ EXP_TASK="summarization" ``` Define the JSON string: ```console user@host:~/lumigator$ export JSON_STRING=$(jq -n \ --arg name "$EXP_NAME" \ --arg desc "$EXP_DESC" \ --arg dataset_id "$EXP_DATASET" \ --arg task "$EXP_TASK" \ '{name: $name, description: $desc, dataset: $dataset_id, task_definition: {task: $task}}') ``` Create the experiment: ```console user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/experiments/ \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d "$JSON_STRING" | jq { "id": "1", "name": "DialogSum Summarization", "description": "See which model best summarizes Dialogues.", "created_at": "2025-02-19T20:11:55.492000", "task_definition": { "task": "summarization" }, "dataset": "9ac42307-e0e5-4635-a9ce-48acdb451742", "updated_at": null, "workflows": null } ``` ::: :::{tab-item} Python SDK :sync: tab2 ```python from lumigator_schemas.experiments import ExperimentCreate from lumigator_schemas.tasks import SummarizationTaskDefinition dataset_id = datasets.items[-1].id request = ExperimentCreate( name="DialogSum Summarization", description="See which model best summarizes Dialogues.", dataset=dataset_id, task_definition=SummarizationTaskDefinition() ) experiment_response = client.experiments.create_experiment(request) experiment_id = experiment_response.id print(f"Experiment created and has ID: {experiment_id}") ``` ::: :::: ## Create a new secret for your OpenAI API key Before creating your first evaluation workflow with an OpenAI model, you need to create a secret in Lumigator to store your OpenAI API key. This secret will be used by Lumigator to authenticate with the OpenAI API. ::::{tab-set} :::{tab-item} cURL :sync: tab1 Set the following variables: ```console user@host:~/lumigator$ export SECRET_NAME="openai_api_key" \ VALUE="sk-..." \ DESCRIPTION="My OpenAI API Key" ``` Define the JSON string: ```console usert@host:~/lumigator$ export JSON_STRING=$(jq -n \ --arg value "$VALUE" \ --arg desc "$DESCRIPTION" \ '{value: $value, description: $desc}') ``` Submit the secret: ```console user@host:~/lumigator$ curl -X PUT http://localhost:8000/api/v1/settings/secrets/$SECRET_NAME \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d "$JSON_STRING" | jq ``` ::: :::{tab-item} Python SDK :sync: tab2 ```python api_key = client.settings.secrets.APIKey.OPENAI response = client.settings.secrets.upload_api_key( api_key, secret_value="sk-...", ) ``` ::: :::: ## Trigger the workflows Now it's time to evaluate a model! Let's trigger workflows to evaluate GPT-4o. This process can be repeated for as many models as you would like to evaluate in the experiment. ```{note} the steps assume you only have created a single experiment. If you have multiple experiments, replace the `"$(curl -s http://localhost:8000/api/v1/experiments/ | jq -r '.items | .[0].id')"` code with the ID of the experiment you want. ``` ::::{tab-set} :::{tab-item} cURL :sync: tab1 Set the following variables: ```console user@host:~/lumigator$ export WORKFLOW_NAME="OpenAI 4o" \ WORKFLOW_DESC="Summarize with 4o." \ EXPERIMENT_ID="$(curl -s http://localhost:8000/api/v1/experiments/ | jq -r '.items | .[0].id')" \ SECRET_KEY_NAME="openai_api_key" ``` Define the JSON string: ```console user@host:~/lumigator$ export JSON_STRING=$(jq -n \ --arg name "$WORKFLOW_NAME" \ --arg model "gpt-4o" \ --arg provider "openai" \ --arg desc "$WORKFLOW_DESC" \ --arg secret_key_name "$SECRET_KEY_NAME" \ --arg exp_id "$EXPERIMENT_ID" \ '{name: $name, description: $desc, model: $model, provider: $provider, secret_key_name: $secret_key_name, experiment_id: $exp_id}') ``` Trigger the workflow: ```console user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/workflows/ \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d "$JSON_STRING" | jq { "id": "2ad30ca4aa7c4aa59eb6dc7235eebd57", "experiment_id": "1", "model": "gpt-4o", "name": "OpenAI 4o", "description": "Summarize with 4o.", "system_prompt": "You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences.", "status": "created", "created_at": "2025-03-24T09:15:30.376000", "updated_at": null } ::: :::{tab-item} Python SDK :sync: tab2 ```python from lumigator_schemas.workflows import WorkflowCreateRequest request = WorkflowCreateRequest( name="OpenAI 4o", description="Summarize with 4o.", model="gpt-4o", provider="openai", secret_key_name="openai_api_key", experiment_id=experiment_id ) client.workflows.create_workflow(request).model_dump() ``` ::: :::: ## Get the results Now that the workflow has been triggered we can get the experiment, which will give us all the details about the containing workflows. When the workflows are completed, this call will give you back all of the information about the evaluation, to let you compare results and review performance. ::::{tab-set} :::{tab-item} cURL :sync: tab1 Set the following variables: ```console user@host:~/lumigator$ export EXPERIMENT_ID="$(curl -s http://localhost:8000/api/v1/experiments/ | jq -r '.items | .[0].id')" ``` Get the experiment! ```console user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/experiments/$EXPERIMENT_ID | jq { "id": "1", "name": "DialogSum Summarization", "description": "See which model best summarizes Dialogues.", "created_at": "2025-02-19T20:11:55.492000", "task_definition": { "task": "summarization" }, "dataset": "9ac42307-e0e5-4635-a9ce-48acdb451742", "updated_at": "2025-02-19T20:11:55.492000", "workflows": [ { "id": "ffa38f72fe7e4b06a60de5bf797c31d6", "experiment_id": "1", "model": "gpt-4o", "name": "OpenAI 4o", "description": "Summarize with 4o.", "status": "succeeded", "created_at": "2025-02-19T20:30:33.713000", "updated_at": null, "jobs": [...] "metrics": { "rouge1_mean": 0.224, "rouge2_mean": 0.106, "rougeL_mean": 0.195, "rougeLsum_mean": 0.195, "bertscore_f1_mean": 0.872, "bertscore_precision_mean": 0.866, "bertscore_recall_mean": 0.878, "meteor_mean": 0.276 }, } ] } ::: :::{tab-item} Python SDK :sync: tab2 ```python experiment_details = client.experiments.get_experiment(experiment_id) print(experiment_details.model_dump_json()) ``` ::: :::: The metrics we use to evaluate are ROUGE, METEOR, and BERT score. They all measure similarity between predicted summaries and those provided with the ground truth, but each of them focuses on different aspects: - [ROUGE](https://aclanthology.org/W04-1013.pdf) - (Recall-Oriented Understudy for Gisting Evaluation), which compares an automatically-generated summary to one generated by a machine learning model on a score of `0` to `1` in a range of metrics comparing statistical similarity of two texts. - [METEOR](https://aclanthology.org/W05-0909.pdf) - Looks at the harmonic mean of precision and recall. - [BERTScore](https://arxiv.org/abs/1904.09675) - Generates embeddings of ground truth input and model output and compares their cosine similarity ## Terminate Session In order to shut down Lumigator, you can stop the containers that were started using Docker Compose. This can be done by simply running the following command: ```console user@host:~/lumigator$ make stop-lumigator ``` ## Next Steps Congratulations! You have successfully uploaded a dataset, created an experiment, run a model evaluation in the experiment, and retrieved the results. For info about developing lumigator, see the [local development guide](../operations-guide/local-development.md). For information about deploying lumigator into a Kubernetes cluster, see [kubernetes installation](../operations-guide/kubernetes-installation.md).