Dataset Tools
Overview
In Arize, datasets are collections of examples used for experimentation and evaluation. Each dataset contains rows of data organized by columns, with support for string, numeric, and list-of-string values.
In arize_toolkit, the Client exposes helpers for:
- Listing all datasets in a space
- Retrieving dataset metadata by name or ID
- Fetching all example rows from the latest dataset version
| Operation | Helper |
|---|---|
| List all datasets | get_all_datasets |
| Get dataset by name | get_dataset |
| Get dataset by ID | get_dataset_by_id |
| Get dataset examples | get_dataset_examples |
Dataset Operations
get_all_datasets
datasets: list[dict] = client.get_all_datasets()
Retrieves all datasets in the current space. Automatically paginates through all results.
Returns
A list of dataset dictionaries, each containing id, name, createdAt, updatedAt, datasetType, status, columns, experimentCount.
Example
datasets = client.get_all_datasets()
for ds in datasets:
print(f"{ds['name']} ({ds['status']}) - {ds['experimentCount']} experiments")
get_dataset
dataset: dict = client.get_dataset(name="my-dataset")
Retrieves a dataset by name within the current space.
Parameters
name(str) — Name of the dataset to retrieve.
Returns
A dictionary with dataset metadata: id, name, createdAt, updatedAt, datasetType, status, columns, experimentCount.
Example
dataset = client.get_dataset(name="pharmacy-malicious-baseline")
print(dataset["id"]) # "RGF0YXNldDox..."
print(dataset["status"]) # "active"
print(dataset["columns"]) # ["input", "output", "timestamp"]
get_dataset_by_id
dataset: dict = client.get_dataset_by_id(dataset_id="RGF0YXNldDox...")
Retrieves a dataset by its ID.
Parameters
dataset_id(str) — ID of the dataset (base64-encoded).
Returns
A dictionary with dataset metadata: id, name, createdAt, updatedAt, datasetType, status, columns, experimentCount.
Example
dataset = client.get_dataset_by_id(dataset_id="RGF0YXNldDox...")
print(dataset["name"]) # "pharmacy-malicious-baseline"
get_dataset_examples
examples: list[dict] = client.get_dataset_examples(name="my-dataset")
Retrieves all examples (rows) from the latest version of a dataset. Automatically paginates through all results. Each example contains a data dictionary mapping column names to their values.
Parameters
name(Optional[str]) — Name of the dataset. Eithernameordataset_idmust be provided.dataset_id(Optional[str]) — ID of the dataset. Eithernameordataset_idmust be provided.
Returns
A list of example dictionaries, each containing:
id(str) — Unique identifier for the examplecreatedAt(datetime) — When the example was createdupdatedAt(datetime) — When the example was last updateddata(dict) — Column name to value mapping
Example
examples = client.get_dataset_examples(name="pharmacy-malicious-baseline")
for ex in examples:
print(ex["id"]) # "RGF0YXNldEV4YW1wbGU6..."
print(ex["data"]["input"]) # "What is the dosage for..."
print(ex["data"]["output"]) # "The recommended dosage..."
# Or by dataset ID
examples = client.get_dataset_examples(dataset_id="RGF0YXNldDox...")
print(len(examples)) # 50