Quickstart
Now that you have a local deployment of Lumigator, you can start using it. In this quickstart guide, we will show you how to upload a dataset and create a simple evaluation job. Finally, we’ll show you how to retrieve the results of the evaluation job.
Upload a Dataset
The Lumigator backend provides an API endpoint for uploading datasets and running evaluation jobs.
To view the available endpoints, navigate to the API documentation page at
http://localhost:8000/docs
.
There are a few ways to interact with the API;
Test the endpoints via the OpenAPI documentation page at
http://localhost:8000/docs
cURL commands.
The Lumigator Python SDK.
We’ll focus on the last two.
To upload a dataset, you need to send a POST request to the /datasets
endpoint. The request should
include the dataset file. Here is an example:
user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/datasets/ \
-H 'Accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'dataset=@'"path/to/dataset.csv"';type=text/csv' \
-F 'format=experiment' | jq
{
"id": "dd15bbaa-8d6f-44ae-a995-b3b78f4ea6fb",
"filename": "dataset.csv",
"format": "experiment",
"size": 180528,
"ground_truth": true,
"created_at": "2024-10-30T12:10:18"
}
from lumigator_sdk.lumigator import LumigatorClient
from schemas.datasets import DatasetFormat
dataset_path = 'path/to/dataset.csv'
lm_client = LumigatorClient('localhost:8000')
response = lm_client.datasets.create_dataset(
open(dataset_path, 'rb'),
DatasetFormat.JOB
)
Note
The dataset file should be in CSV format and contain a header row with the following columns:
examples
, ground_truth
. The ground_truth
column is optional since you can generate it using
Lumigator. See here
for an example.
You can verify that the dataset was uploaded successfully by asking the API to list all datasets and checking that the uploaded dataset is in the list:
user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/datasets/ | jq -r '.items | .[] | .filename'
dataset.csv
datasets = lm_client.datasets.get_datasets()
print(datasets.items[0].filename)
Create an Evaluation Job
Now that you have uploaded a dataset, you can create an evaluation job. To this end, you need to
send a POST request to the /jobs/evaluate
endpoint. The request should include the following
required fields:
A name for the evaluation job.
A short description for tracking purposes.
The name of the model you want to evaluate.
The ID of the dataset you want to use for evaluation.
The maximum number of examples to use for evaluation.
Here is an example of how to create an evaluation job:
Set the following variables:
user@host:~/lumigator$ export EVAL_NAME="test_run_hugging_face" \
EVAL_DESC="Test run for Huggingface model" \
EVAL_MODEL="hf://facebook/bart-large-cnn" \
EVAL_DATASET="$(curl -s http://localhost:8000/api/v1/datasets/ | jq -r '.items | .[0].id')" \
EVAL_MAX_SAMPLES="10"
Define the JSON string:
user@host:~/lumigator$ export JSON_STRING=$(jq -n \
--arg name "$EVAL_NAME" \
--arg desc "$EVAL_DESC" \
--arg model "$EVAL_MODEL" \
--arg dataset_id "$EVAL_DATASET" \
--arg max_samples "$EVAL_MAX_SAMPLES" \
'{name: $name, description: $desc, model: $model, dataset: $dataset_id, max_samples: $max_samples}'
)
Create the evaluation job:
user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/jobs/evaluate/ \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d "$JSON_STRING" | jq
{
"id": "3f15667d-d2e7-459b-9c22-3da2d236b406",
"name": "test_run_hugging_face",
"description": "Test run for Huggingface model",
"status": "created",
"created_at": "2024-10-31T09:07:43",
"updated_at": null
}
from schemas.jobs import JobType, JobCreate
dataset_id = datasets.items[0].id
models = ['hf://facebook/bart-large-cnn',]
# set this value to limit the evaluation to the first max_samples items (0=all)
max_samples = 10
# team_name is a way to group jobs together under the same namespace, feel free to customize it
team_name = "lumigator_enthusiasts"
responses = []
for model in models:
job_args = JobCreate(
name=team_name,
description="Test",
model=model,
dataset=str(dataset_id),
max_samples=max_samples
)
# descr = f"Testing {model} summarization model on {dataset_name}"
responses.append(lm_client.jobs.create_job(JobType.EVALUATION, job_args))
Track the Evaluation Job
You can track the status of the evaluation job by sending a GET request to the /jobs/{job_id}
endpoint, or by using th Lumigator Python SDK. Here is an example of how to track the evaluation
job:
Get the job’s submission ID:
user@host:~/lumigator$ export SUBMISSION_ID=$(curl -s http://localhost:8000/api/v1/health/jobs/ | jq -r 'sort_by(.start_time) | reverse | .[0] | .submission_id')
Track the job:
user@host:~/lumigator$ curl -s "http://localhost:8000/api/v1/health/jobs/$SUBMISSION_ID" \
-H 'Accept: application/json' | jq
{
"type": "SUBMISSION",
"job_id": null,
"submission_id": "5195c9a5-938d-475e-b0fc-cf866492909d",
"driver_info": null,
"status": "SUCCEEDED",
...
}
job_id = responses[0].id
job = lm_client.jobs.wait_for_job(job_id) # Create the coroutine object
result = await job # Await the coroutine to get the result
print(result)
Retrieve the Results
Once the evaluation job is complete, you can retrieve the results by sending a GET request to the
/jobs/{job_id}/result/download
endpoint, or by using the Lumigator Python SDK. This will return a
URI that you can use to download the results. Here is an example of how to retrieve the results:
user@host:~/lumigator$ curl -s http://localhost:8000/api/v1/jobs/$SUBMISSION_ID/result/download \
-H 'accept: application/json' | jq
{
"id": "5195c9a5-938d-475e-b0fc-cf866492909d",
"download_url": "http://localhost:4566/lumigator-storage/jobs/results/lumigator_enthusiasts/5195c9a5-938d-475e-b0fc-cf866492909d/eval_results.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=test%2F20241031%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20241031T104126Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=0309fe4825bc2358180c607a4a4ad4e8d36946133574d8b9416df228ce62944e"
}
import requests
eval_result = lm_client.jobs.get_job_download(job_id)
response = requests.request(url=eval_result.download_url, method="GET")
The metrics we use to evaluate are ROUGE, METEOR, and BERT score. They all measure similarity between predicted summaries and those provided with the ground truth, but each of them focuses on different aspects:
ROUGE - (Recall-Oriented Understudy for Gisting Evaluation), which compares an automatically-generated summary to one generated by a machine learning model on a score of
0
to1
in a range of metrics comparing statistical similarity of two texts.METEOR - Looks at the harmonic mean of precision and recall.
BERTScore - Generates embeddings of ground truth input and model output and compares their cosine similarity
Next Steps
Congratulations! You have successfully uploaded a dataset, created an evaluation job, and retrieved the results. In the next section, we will show you how to deploy Lumigator on a distributed environment.