Tracking

The experiment tracking module lets you log metrics, hyperparameters, and computation graphs across training runs, storing everything in a SQLite database. Tracker is the high-level API: you set an experiment, start a run, log scalars after each epoch, and end the run. Logged data can then be explored via the SimpleBoard web dashboard.

import simplegrad as sg

tracker = sg.Tracker(all_exp_dir="./experiments")
tracker.set_experiment("mnist_run")
tracker.start_run(run_name="baseline", hparams={"lr": 0.01, "batch_size": 32})

for epoch in range(10):
    loss = ...  # compute loss
    tracker.log({"train_loss": float(loss.values)}, step=epoch)

tracker.end_run()

`Tracker`

Tracker class to manage experiments and runs using ExperimentDBManager. It allows setting experiment, starting and ending runs.

There is a main directory where all experiment databases are stored. One database corresponds to one experiment. Each experiment can have multiple runs, each with its own metrics and computational graphs.

`delete_run(run_id: int)`

Delete a run and all its data

`end_run(status: str = 'completed')`

End the current run with a given status

`get_all_exp_paths() -> list[Path]`

Get all experiment database paths.

`get_all_runs() -> list[RunInfo]`

Get all runs

`get_comp_graph(graph_id: int) -> dict | None`

Get computation graph for a given run

`get_comp_graphs(run_id: int) -> list[dict]`

Get all computation graphs for a given run

`get_metrics(run_id: int) -> list[str]`

Get all metric names for a given run

`get_records(run_id: int, metric_name: str) -> list[RecordInfo]`

Get metric records for a given run and optional metric name

`get_results(run_id: int) -> dict[str, list[RecordInfo]]`

Get all metric records for a given run

`get_run(run_id: int) -> RunInfo | None`

Get a specific run by id

`histogram(name: str, tensor: Tensor | np.ndarray, step: int, bins: int = 30)`

Log a histogram of values at a given step

`image(name: str, image_data: np.ndarray, step: int)`

Log an image at a given step. image_data should be a numpy array of shape (H, W, C) or (H, W)

`record(metric_name: str, value: float, step: int)`

Log a metric value at a given step

`run(name: str | None = None, config: dict | None = None)`

Context manager that starts a run on enter and ends it on exit.

On clean exit the run is marked "completed". If the body raises, the run is marked "failed" and the exception is re-raised — so dashboards can distinguish crashed runs from successful ones without extra code in the training loop.

Example

tracker.set_experiment("mnist") with tracker.run(name="lr=0.01", config={"lr": 0.01}) as run_id: ... for step in range(100): ... tracker.record("loss", loss_val, step)

`save_comp_graph(tensor: Tensor, run_id: int | None = None)`

Save computation graph for the current run

`set_all_exp_dir(directory: str)`

Set the experiments directory

`set_experiment(exp_name: str)`

Set the current experiment by name, initializing its database manager.

`start_run(name: str | None = None, config: dict | None = None) -> int`

Start a new run and return the run_id

`summary(run_id: int | None = None) -> dict[str, dict]`

Return per-metric summary statistics (min/max/mean/std/last/n).

If run_id is omitted, summarizes the active run. The returned dict maps metric name -> {min, max, mean, std, last, n, first_step, last_step} and is the same data exposed at /api/runs/{id}/summary.

`text(name: str, content: str, step: int)`

Log a text snippet at a given step.

Useful for recording sampled outputs from a generative model, validation predictions, free-form notes, or model summaries — anything you'd want to read back later in the dashboard. Mirrors TensorBoard's add_text.

Parameters:

name (str) –

Tag under which to group related texts (e.g. "samples").
content (str) –

The string to record. No length limit, but very long strings will bloat the database.
step (int) –

Training step the text is associated with.

`ExperimentDBManager`

SQLite-based storage for training runs and metrics.

`check_connection() -> bool`

Check if the database exists and is accessible.

`create_run(name: str | None = None, config: dict | None = None) -> int`

Create a new training run. Returns run_id.

`delete_run(run_id: int)`

Delete a run and all its data.

`get_all_runs() -> list[RunInfo]`

List all runs, newest first.

`get_comp_graph(graph_id: int) -> dict | None`

Get a single computation graph by its ID.

`get_comp_graphs(run_id: int) -> list[dict]`

Get all computation graphs for a run.

`get_histograms(run_id: int) -> dict[str, list[dict]]`

Get all histograms for a run.

`get_images(run_id: int) -> dict[str, list[dict]]`

Get all images for a run.

`get_metric_summary(run_id: int) -> dict[str, dict]`

Compute per-metric summary stats (min/max/mean/std/last/n) for a run.

Returns a mapping from metric name to a dict with the standard TensorBoard scalar-summary fields. Cheap because it's a single SQL pass per metric and the records table is indexed on (run_id, step).

`get_metrics(run_id: int) -> list[str]`

Get list of metric names for a run.

`get_records(run_id: int, metric_name: str) -> list[RecordInfo]`

Get metric records for a run. Returns {metric_name: [MetricRecord, ...]}

`get_run(run_id: int) -> RunInfo | None`

Get run metadata.

`get_texts(run_id: int) -> dict[str, list[dict]]`

Get all text entries for a run, grouped by name.

`init_exp_db()`

Initialize database schema.

`record(run_id: int, metric_name: str, step: int, value: float)`

Log a single metric record.

`save_comp_graph(run_id: int, graph_data: dict)`

Save computation graph as JSON.

`save_histogram(run_id: int, name: str, step: int, bucket_edges: list[float], bucket_counts: list[int])`

Save a histogram.

`save_image(run_id: int, name: str, step: int, width: int, height: int, channels: int, image_data: bytes)`

Save raw image data.

`save_text(run_id: int, name: str, step: int, content: str)`

Save a text entry for a run.

Use this to record arbitrary string artifacts associated with a step: sampled model outputs, validation predictions, free-form notes, model summaries, etc. Mirrors TensorBoard's add_text semantics.

`update_run_status(run_id: int, status: str)`

Update run status.

`RunInfo` `dataclass`

Metadata for a training run.

`RecordInfo` `dataclass`

A single metric record (data point).

`_build_graph_data(tensor: Tensor) -> dict`

Build a JSON-serializable graph structure for D3.js visualization.

Tracking

Tracker

delete_run(run_id: int)

end_run(status: str = 'completed')

get_all_exp_paths() -> list[Path]

get_all_runs() -> list[RunInfo]

get_comp_graph(graph_id: int) -> dict | None

get_comp_graphs(run_id: int) -> list[dict]

get_metrics(run_id: int) -> list[str]

get_records(run_id: int, metric_name: str) -> list[RecordInfo]

get_results(run_id: int) -> dict[str, list[RecordInfo]]

get_run(run_id: int) -> RunInfo | None

histogram(name: str, tensor: Tensor | np.ndarray, step: int, bins: int = 30)

image(name: str, image_data: np.ndarray, step: int)

record(metric_name: str, value: float, step: int)

run(name: str | None = None, config: dict | None = None)

save_comp_graph(tensor: Tensor, run_id: int | None = None)

set_all_exp_dir(directory: str)

set_experiment(exp_name: str)

start_run(name: str | None = None, config: dict | None = None) -> int

summary(run_id: int | None = None) -> dict[str, dict]

text(name: str, content: str, step: int)

ExperimentDBManager

check_connection() -> bool

create_run(name: str | None = None, config: dict | None = None) -> int

delete_run(run_id: int)

get_all_runs() -> list[RunInfo]

get_comp_graph(graph_id: int) -> dict | None

get_comp_graphs(run_id: int) -> list[dict]

get_histograms(run_id: int) -> dict[str, list[dict]]

get_images(run_id: int) -> dict[str, list[dict]]

get_metric_summary(run_id: int) -> dict[str, dict]

get_metrics(run_id: int) -> list[str]

get_records(run_id: int, metric_name: str) -> list[RecordInfo]

get_run(run_id: int) -> RunInfo | None

get_texts(run_id: int) -> dict[str, list[dict]]

init_exp_db()

record(run_id: int, metric_name: str, step: int, value: float)

save_comp_graph(run_id: int, graph_data: dict)

save_histogram(run_id: int, name: str, step: int, bucket_edges: list[float], bucket_counts: list[int])

save_image(run_id: int, name: str, step: int, width: int, height: int, channels: int, image_data: bytes)

save_text(run_id: int, name: str, step: int, content: str)

update_run_status(run_id: int, status: str)

RunInfo dataclass

RecordInfo dataclass

_build_graph_data(tensor: Tensor) -> dict

`Tracker`

`delete_run(run_id: int)`

`end_run(status: str = 'completed')`

`get_all_exp_paths() -> list[Path]`

`get_all_runs() -> list[RunInfo]`

`get_comp_graph(graph_id: int) -> dict | None`

`get_comp_graphs(run_id: int) -> list[dict]`

`get_metrics(run_id: int) -> list[str]`

`get_records(run_id: int, metric_name: str) -> list[RecordInfo]`

`get_results(run_id: int) -> dict[str, list[RecordInfo]]`

`get_run(run_id: int) -> RunInfo | None`

`histogram(name: str, tensor: Tensor | np.ndarray, step: int, bins: int = 30)`

`image(name: str, image_data: np.ndarray, step: int)`

`record(metric_name: str, value: float, step: int)`

`run(name: str | None = None, config: dict | None = None)`

`save_comp_graph(tensor: Tensor, run_id: int | None = None)`

`set_all_exp_dir(directory: str)`

`set_experiment(exp_name: str)`

`start_run(name: str | None = None, config: dict | None = None) -> int`

`summary(run_id: int | None = None) -> dict[str, dict]`

`text(name: str, content: str, step: int)`

`ExperimentDBManager`

`check_connection() -> bool`

`create_run(name: str | None = None, config: dict | None = None) -> int`

`delete_run(run_id: int)`

`get_all_runs() -> list[RunInfo]`

`get_comp_graph(graph_id: int) -> dict | None`

`get_comp_graphs(run_id: int) -> list[dict]`

`get_histograms(run_id: int) -> dict[str, list[dict]]`

`get_images(run_id: int) -> dict[str, list[dict]]`

`get_metric_summary(run_id: int) -> dict[str, dict]`

`get_metrics(run_id: int) -> list[str]`

`get_records(run_id: int, metric_name: str) -> list[RecordInfo]`

`get_run(run_id: int) -> RunInfo | None`

`get_texts(run_id: int) -> dict[str, list[dict]]`

`init_exp_db()`

`record(run_id: int, metric_name: str, step: int, value: float)`

`save_comp_graph(run_id: int, graph_data: dict)`

`save_histogram(run_id: int, name: str, step: int, bucket_edges: list[float], bucket_counts: list[int])`

`save_image(run_id: int, name: str, step: int, width: int, height: int, channels: int, image_data: bytes)`

`save_text(run_id: int, name: str, step: int, content: str)`

`update_run_status(run_id: int, status: str)`

`RunInfo` `dataclass`

`RecordInfo` `dataclass`

`_build_graph_data(tensor: Tensor) -> dict`