Skip to content

SGD

SGD (Stochastic Gradient Descent) updates each parameter by subtracting a fixed fraction of its gradient. It optionally supports momentum, which accumulates a velocity vector to dampen oscillations and accelerate convergence along consistent gradient directions. SGD is simpler than Adam and often preferred when training convolutional networks on vision tasks.

import simplegrad as sg
import simplegrad.nn as nn
import simplegrad.optimizers as optim

model = nn.Linear(128, 10)
opt = optim.SGD(lr=0.01, momentum=0.9, model=model)

loss = model(sg.ones((4, 128))).sum()
loss.backward()
opt.step()
opt.zero_grad()

SGD

Bases: Optimizer

Stochastic gradient descent with optional momentum.

Update rule (with momentum)::

v_t = momentum * v_{t-1} - lr * (1 - dampening) * grad
param += v_t

Attributes

Attribute Type Description
.lr float Default learning rate.
.step_count int Number of optimization steps taken.
.param_groups list[dict] Parameter groups with "lr", "momentum", "dampening", and "params".
.velocities dict Per-parameter velocity arrays used by the momentum update.

Methods

Method Description
.step() Apply one SGD update step to all parameters.
.state() Return the full optimizer state including velocities.

Inherits .zero_grad(), .reset_step_count(), .set_param() from Optimizer.