Datasets API

A dataset is a collection of data points (or samples) used for training, testing, or evaluating machine learning models.

The API provides endpoints to list all datasets, retrieve details of a specific dataset, delete a dataset, and download datasets. The upload process ensures that the dataset is compatible with HuggingFace standards, although the recreated CSV file may have different delimiters. The API supports various operations with appropriate status codes to indicate the success or failure of each request.

Endpoints

POST /api/v1/datasets/

Upload Dataset

Uploads the dataset for use in Lumigator.

An uploaded dataset is parsed into HuggingFace format files and stored alongside a recreated version of the input dataset.

NOTE: The recreated version of the CSV file may not have identical delimiters as it will follow the format that HuggingFace uses when it generates the CSV.

Status Codes:
GET /api/v1/datasets/

List Datasets

Query Parameters:
  • skip (integer)

  • limit (integer)

Status Codes:
GET /api/v1/datasets/{dataset_id}

Get Dataset

Parameters:
  • dataset_id (string)

Status Codes:
DELETE /api/v1/datasets/{dataset_id}

Delete Dataset

Parameters:
  • dataset_id (string)

Status Codes:
GET /api/v1/datasets/{dataset_id}/download

Get Dataset Download

Returns a collection of pre-signed URLs which can be used to download the dataset.

Parameters:
  • dataset_id (string)

Query Parameters:
  • extension ({'null', 'string'}) – When specified, will be used to return only URLs for files which have a matching file extension. Wildcards are not accepted. By default all files are returned. e.g. csv

Status Codes: