Skip to content

Tracking

The experiment tracking module lets you log metrics, hyperparameters, and computation graphs across training runs, storing everything in a SQLite database. Tracker is the high-level API: you set an experiment, start a run, log scalars after each epoch, and end the run. Logged data can then be explored via the SimpleBoard web dashboard.

import simplegrad as sg

tracker = sg.Tracker(all_exp_dir="./experiments")
tracker.set_experiment("mnist_run")
tracker.start_run(run_name="baseline", hparams={"lr": 0.01, "batch_size": 32})

for epoch in range(10):
    loss = ...  # compute loss
    tracker.log({"train_loss": float(loss.values)}, step=epoch)

tracker.end_run()

Tracker

Tracker class to manage experiments and runs using ExperimentDBManager. It allows setting experiment, starting and ending runs.

There is a main directory where all experiment databases are stored. One database corresponds to one experiment. Each experiment can have multiple runs, each with its own metrics and computational graphs.

delete_run(run_id: int)

Delete a run and all its data

end_run(status: str = 'completed')

End the current run with a given status

get_all_exp_paths() -> list[Path]

Get all experiment database paths.

get_all_runs() -> list[RunInfo]

Get all runs

get_comp_graph(graph_id: int) -> dict | None

Get computation graph for a given run

get_comp_graphs(run_id: int) -> list[dict]

Get all computation graphs for a given run

get_metrics(run_id: int) -> list[str]

Get all metric names for a given run

get_records(run_id: int, metric_name: str) -> list[RecordInfo]

Get metric records for a given run and optional metric name

get_results(run_id: int) -> dict[str, list[RecordInfo]]

Get all metric records for a given run

get_run(run_id: int) -> RunInfo | None

Get a specific run by id

histogram(name: str, tensor: Tensor | np.ndarray, step: int, bins: int = 30)

Log a histogram of values at a given step

image(name: str, image_data: np.ndarray, step: int)

Log an image at a given step. image_data should be a numpy array of shape (H, W, C) or (H, W)

record(metric_name: str, value: float, step: int)

Log a metric value at a given step

run(name: str | None = None, config: dict | None = None)

Context manager that starts a run on enter and ends it on exit.

On clean exit the run is marked "completed". If the body raises, the run is marked "failed" and the exception is re-raised — so dashboards can distinguish crashed runs from successful ones without extra code in the training loop.

Example

tracker.set_experiment("mnist") with tracker.run(name="lr=0.01", config={"lr": 0.01}) as run_id: ... for step in range(100): ... tracker.record("loss", loss_val, step)

save_comp_graph(tensor: Tensor, run_id: int | None = None)

Save computation graph for the current run

set_all_exp_dir(directory: str)

Set the experiments directory

set_experiment(exp_name: str)

Set the current experiment by name, initializing its database manager.

start_run(name: str | None = None, config: dict | None = None) -> int

Start a new run and return the run_id

summary(run_id: int | None = None) -> dict[str, dict]

Return per-metric summary statistics (min/max/mean/std/last/n).

If run_id is omitted, summarizes the active run. The returned dict maps metric name -> {min, max, mean, std, last, n, first_step, last_step} and is the same data exposed at /api/runs/{id}/summary.

text(name: str, content: str, step: int)

Log a text snippet at a given step.

Useful for recording sampled outputs from a generative model, validation predictions, free-form notes, or model summaries — anything you'd want to read back later in the dashboard. Mirrors TensorBoard's add_text.

Parameters:

  • name (str) –

    Tag under which to group related texts (e.g. "samples").

  • content (str) –

    The string to record. No length limit, but very long strings will bloat the database.

  • step (int) –

    Training step the text is associated with.


ExperimentDBManager

SQLite-based storage for training runs and metrics.

check_connection() -> bool

Check if the database exists and is accessible.

create_run(name: str | None = None, config: dict | None = None) -> int

Create a new training run. Returns run_id.

delete_run(run_id: int)

Delete a run and all its data.

get_all_runs() -> list[RunInfo]

List all runs, newest first.

get_comp_graph(graph_id: int) -> dict | None

Get a single computation graph by its ID.

get_comp_graphs(run_id: int) -> list[dict]

Get all computation graphs for a run.

get_histograms(run_id: int) -> dict[str, list[dict]]

Get all histograms for a run.

get_images(run_id: int) -> dict[str, list[dict]]

Get all images for a run.

get_metric_summary(run_id: int) -> dict[str, dict]

Compute per-metric summary stats (min/max/mean/std/last/n) for a run.

Returns a mapping from metric name to a dict with the standard TensorBoard scalar-summary fields. Cheap because it's a single SQL pass per metric and the records table is indexed on (run_id, step).

get_metrics(run_id: int) -> list[str]

Get list of metric names for a run.

get_records(run_id: int, metric_name: str) -> list[RecordInfo]

Get metric records for a run. Returns {metric_name: [MetricRecord, ...]}

get_run(run_id: int) -> RunInfo | None

Get run metadata.

get_texts(run_id: int) -> dict[str, list[dict]]

Get all text entries for a run, grouped by name.

init_exp_db()

Initialize database schema.

record(run_id: int, metric_name: str, step: int, value: float)

Log a single metric record.

save_comp_graph(run_id: int, graph_data: dict)

Save computation graph as JSON.

save_histogram(run_id: int, name: str, step: int, bucket_edges: list[float], bucket_counts: list[int])

Save a histogram.

save_image(run_id: int, name: str, step: int, width: int, height: int, channels: int, image_data: bytes)

Save raw image data.

save_text(run_id: int, name: str, step: int, content: str)

Save a text entry for a run.

Use this to record arbitrary string artifacts associated with a step: sampled model outputs, validation predictions, free-form notes, model summaries, etc. Mirrors TensorBoard's add_text semantics.

update_run_status(run_id: int, status: str)

Update run status.

RunInfo dataclass

Metadata for a training run.

RecordInfo dataclass

A single metric record (data point).


_build_graph_data(tensor: Tensor) -> dict

Build a JSON-serializable graph structure for D3.js visualization.