Skip to content

Loss Functions

Loss functions measure the discrepancy between model predictions and target labels. ce_loss (cross-entropy) is the standard choice for classification — it expects raw logits and integer class indices. mse_loss (mean squared error) is the go-to for regression tasks. Both return a scalar Tensor whose .backward() triggers the gradient computation for the whole network.

import simplegrad as sg

logits = sg.normal((4, 10), requires_grad=True)
targets = sg.Tensor([2, 7, 0, 5])
loss = sg.ce_loss(logits, targets)
loss.backward()

ce_loss

Cross-entropy loss over raw logits. A softmax is applied internally, so do not pass pre-softmaxed probabilities.

\[ \mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \log\frac{e^{x_{i,y_i}}}{\sum_j e^{x_{i,j}}} \]

ce_loss(z: Tensor, y: Tensor, dim: int = -1, reduction: str = 'mean') -> Tensor

Compute cross-entropy loss with built-in softmax.

Numerically stable: uses the log-sum-exp trick internally.

Parameters:

  • z (Tensor) –

    Logits (raw unnormalized scores), shape (..., num_classes).

  • y (Tensor) –

    Target probability distribution, same shape as z.

  • dim (int, default: -1 ) –

    Class dimension to apply softmax over. Defaults to -1 (last dim).

  • reduction (str, default: 'mean' ) –

    How to reduce the per-sample losses. One of "mean", "sum", or None (return per-sample losses).

Raises:

  • ValueError

    If reduction is not a valid option.

mse_loss

\[ \mathcal{L} = \frac{1}{N}\sum_{i=1}^{N}(y_i - \hat{y}_i)^2 \]

mse_loss(p: Tensor, y: Tensor, reduction: str = 'mean') -> Tensor

Compute mean squared error loss: mean((p - y)^2).

Parameters:

  • p (Tensor) –

    Predictions tensor.

  • y (Tensor) –

    Targets tensor, same shape as p.

  • reduction (str, default: 'mean' ) –

    One of "mean", "sum", or None.

Raises:

  • ValueError

    If reduction is not a valid option.