Adam

Adam (Adaptive Moment Estimation) maintains per-parameter first and second moment estimates of the gradients, allowing it to adapt the effective learning rate for each parameter individually. It combines the benefits of momentum and RMSProp and is the most widely used optimiser for training deep networks. The standard defaults (beta_1=0.9, beta_2=0.999) work well across a broad range of tasks.

import simplegrad as sg
import simplegrad.nn as nn
import simplegrad.optimizers as optim

model = nn.Linear(128, 10)
opt = optim.Adam(lr=1e-3, model=model)

loss = model(sg.ones((4, 128))).sum()
loss.backward()
opt.step()
opt.zero_grad()

`Adam`

Bases: Optimizer

Adam optimizer with bias-corrected moment estimates.

Update rule (maximize=False)::

m_t = beta_1 * m_{t-1} + (1 - beta_1) * grad
v_t = beta_2 * v_{t-1} + (1 - beta_2) * grad^2
m_hat = m_t / (1 - beta_1^t)
v_hat = v_t / (1 - beta_2^t)
param -= lr * m_hat / (sqrt(v_hat) + eps)

When maximize=True the sign is flipped so the update ascends the objective (useful for reinforcement learning or contrastive objectives).

Supports parameter groups, allowing different lr, beta_1, beta_2, eps, and maximize per group. Pass a list of dicts to param_groups, each with a "params" key (a Module or a dict[str, Tensor]) and optional per-group overrides:

>>> optimizer = Adam(
...     lr=1e-3,
...     param_groups=[
...         {"params": model.encoder},
...         {"params": model.decoder, "lr": 1e-4, "beta_1": 0.8},
...     ],
... )

Parameters:

model (Module | None, default: None ) –

The model whose parameters to optimize (single-group shorthand).
lr (float, default: 0.001 ) –

Default learning rate. Defaults to 1e-3.
beta_1 (float, default: 0.9 ) –

Default exponential decay for the first moment. Defaults to 0.9.
beta_2 (float, default: 0.999 ) –

Default exponential decay for the second moment. Defaults to 0.999.
eps (float, default: 1e-08 ) –

Default numerical stability constant. Defaults to 1e-8.
maximize (bool, default: False ) –

If True, maximizes the objective instead of minimizing it. Defaults to False.
param_groups (list[dict] | None, default: None ) –

List of parameter group dicts with optional per-group overrides for lr, beta_1, beta_2, eps, and maximize.

Attributes

Attribute	Type	Description
`.lr`	`float`	Default learning rate.
`.step_count`	`int`	Number of optimization steps taken.
`.param_groups`	`list[dict]`	Parameter groups with `"lr"`, `"beta_1"`, `"beta_2"`, `"eps"`, and `"params"`.

Methods

Method	Description
`.step()`	Apply one Adam update step to all parameters.
`.state()`	Return the full optimizer state including moment estimates.

Inherits .zero_grad(), .reset_step_count(), .set_param() from Optimizer.