Tracking
The experiment tracking module lets you log metrics, hyperparameters, and computation graphs across training runs, storing everything in a SQLite database. Tracker is the high-level API: you set an experiment, start a run, log scalars after each epoch, and end the run. Logged data can then be explored via the SimpleBoard web dashboard.
import simplegrad as sg
tracker = sg.Tracker(all_exp_dir="./experiments")
tracker.set_experiment("mnist_run")
tracker.start_run(run_name="baseline", hparams={"lr": 0.01, "batch_size": 32})
for epoch in range(10):
loss = ... # compute loss
tracker.log({"train_loss": float(loss.values)}, step=epoch)
tracker.end_run()
Tracker
Tracker class to manage experiments and runs using ExperimentDBManager. It allows setting experiment, starting and ending runs.
There is a main directory where all experiment databases are stored. One database corresponds to one experiment. Each experiment can have multiple runs, each with its own metrics and computational graphs.
delete_run(run_id: int)
Delete a run and all its data
end_run(status: str = 'completed')
End the current run with a given status
get_all_exp_paths() -> list[Path]
Get all experiment database paths.
get_all_runs() -> list[RunInfo]
Get all runs
get_comp_graph(graph_id: int) -> dict | None
Get computation graph for a given run
get_comp_graphs(run_id: int) -> list[dict]
Get all computation graphs for a given run
get_metrics(run_id: int) -> list[str]
Get all metric names for a given run
get_records(run_id: int, metric_name: str) -> list[RecordInfo]
Get metric records for a given run and optional metric name
get_results(run_id: int) -> dict[str, list[RecordInfo]]
Get all metric records for a given run
get_run(run_id: int) -> RunInfo | None
Get a specific run by id
histogram(name: str, tensor: Tensor | np.ndarray, step: int, bins: int = 30)
Log a histogram of values at a given step
image(name: str, image_data: np.ndarray, step: int)
Log an image at a given step. image_data should be a numpy array of shape (H, W, C) or (H, W)
record(metric_name: str, value: float, step: int)
Log a metric value at a given step
run(name: str | None = None, config: dict | None = None)
Context manager that starts a run on enter and ends it on exit.
On clean exit the run is marked "completed". If the body raises, the run is marked "failed" and the exception is re-raised — so dashboards can distinguish crashed runs from successful ones without extra code in the training loop.
Example
tracker.set_experiment("mnist") with tracker.run(name="lr=0.01", config={"lr": 0.01}) as run_id: ... for step in range(100): ... tracker.record("loss", loss_val, step)
save_comp_graph(tensor: Tensor, run_id: int | None = None)
Save computation graph for the current run
set_all_exp_dir(directory: str)
Set the experiments directory
set_experiment(exp_name: str)
Set the current experiment by name, initializing its database manager.
start_run(name: str | None = None, config: dict | None = None) -> int
Start a new run and return the run_id
summary(run_id: int | None = None) -> dict[str, dict]
Return per-metric summary statistics (min/max/mean/std/last/n).
If run_id is omitted, summarizes the active run. The returned dict
maps metric name -> {min, max, mean, std, last, n, first_step,
last_step} and is the same data exposed at /api/runs/{id}/summary.
text(name: str, content: str, step: int)
Log a text snippet at a given step.
Useful for recording sampled outputs from a generative model, validation
predictions, free-form notes, or model summaries — anything you'd want
to read back later in the dashboard. Mirrors TensorBoard's add_text.
Parameters:
-
name(str) –Tag under which to group related texts (e.g. "samples").
-
content(str) –The string to record. No length limit, but very long strings will bloat the database.
-
step(int) –Training step the text is associated with.
ExperimentDBManager
SQLite-based storage for training runs and metrics.
check_connection() -> bool
Check if the database exists and is accessible.
create_run(name: str | None = None, config: dict | None = None) -> int
Create a new training run. Returns run_id.
delete_run(run_id: int)
Delete a run and all its data.
get_all_runs() -> list[RunInfo]
List all runs, newest first.
get_comp_graph(graph_id: int) -> dict | None
Get a single computation graph by its ID.
get_comp_graphs(run_id: int) -> list[dict]
Get all computation graphs for a run.
get_histograms(run_id: int) -> dict[str, list[dict]]
Get all histograms for a run.
get_images(run_id: int) -> dict[str, list[dict]]
Get all images for a run.
get_metric_summary(run_id: int) -> dict[str, dict]
Compute per-metric summary stats (min/max/mean/std/last/n) for a run.
Returns a mapping from metric name to a dict with the standard TensorBoard scalar-summary fields. Cheap because it's a single SQL pass per metric and the records table is indexed on (run_id, step).
get_metrics(run_id: int) -> list[str]
Get list of metric names for a run.
get_records(run_id: int, metric_name: str) -> list[RecordInfo]
Get metric records for a run. Returns {metric_name: [MetricRecord, ...]}
get_run(run_id: int) -> RunInfo | None
Get run metadata.
get_texts(run_id: int) -> dict[str, list[dict]]
Get all text entries for a run, grouped by name.
init_exp_db()
Initialize database schema.
record(run_id: int, metric_name: str, step: int, value: float)
Log a single metric record.
save_comp_graph(run_id: int, graph_data: dict)
Save computation graph as JSON.
save_histogram(run_id: int, name: str, step: int, bucket_edges: list[float], bucket_counts: list[int])
Save a histogram.
save_image(run_id: int, name: str, step: int, width: int, height: int, channels: int, image_data: bytes)
Save raw image data.
save_text(run_id: int, name: str, step: int, content: str)
Save a text entry for a run.
Use this to record arbitrary string artifacts associated with a step:
sampled model outputs, validation predictions, free-form notes, model
summaries, etc. Mirrors TensorBoard's add_text semantics.
update_run_status(run_id: int, status: str)
Update run status.
RunInfo
dataclass
Metadata for a training run.
RecordInfo
dataclass
A single metric record (data point).
_build_graph_data(tensor: Tensor) -> dict
Build a JSON-serializable graph structure for D3.js visualization.