Skip to content

Schedulers

Learning rate schedulers adjust the optimizer's learning rate over the course of training. Simplegrad provides function-based schedules (LinearLR, ExponentialLR, CosineAnnealingLR) that change the rate according to a fixed formula, and a metric-based schedule (ReduceLROnPlateauLR) that reduces the rate when a monitored metric stops improving. Call scheduler.step() once per epoch after the optimizer step.

import simplegrad as sg
import simplegrad.nn as nn
import simplegrad.optimizers as optim
import simplegrad.schedulers as schedulers

model = nn.Linear(64, 10)
opt = optim.Adam(lr=1e-2, model=model)
scheduler = schedulers.CosineAnnealingLR(opt, T_max=50)

for epoch in range(50):
    # ... training loop ...
    scheduler.step()

All schedulers inherit from Scheduler.


LinearLR

LinearLR

Bases: Scheduler

Methods

Method Description
.step() Linearly interpolate the learning rate for the current step.

ExponentialLR

ExponentialLR

Bases: Scheduler

Decays the learning rate by a multiplicative factor each step.

Computes the learning rate as

lr = start_lr * gamma^steps

Any three of start_lr, end_lr, total_steps, gamma can be provided to fully define the schedule. Alternatively, only start_lr and gamma can be provided for an infinite decay.

Parameters:

  • optimizer (Optimizer) –

    The optimizer whose learning rate should be scheduled.

  • start_lr (float | None, default: None ) –

    Initial learning rate.

  • end_lr (float | None, default: None ) –

    Final learning rate after total_steps.

  • total_steps (int | None, default: None ) –

    Number of steps over which to decay.

  • gamma (float | None, default: None ) –

    Multiplicative factor applied each step.

Methods

Method Description
.step() Multiply the learning rate by gamma each step.

CosineAnnealingLR

CosineAnnealingLR

Bases: Scheduler

Sets the learning rate using cosine annealing with warm restarts.

The learning rate follows

lr = lr_min + 0.5 * (lr_max - lr_min) * (1 + cos(pi * t_cur / T_i))

where t_cur is the number of steps since the last restart and T_i is the length of the current period. After each period expires, t_cur resets to 0 and T_i is multiplied by T_mult.

Parameters:

  • optimizer (Optimizer) –

    The optimizer whose learning rate should be scheduled.

  • T_0 (int) –

    Number of steps in the first restart period.

  • T_mult (int, default: 1 ) –

    Factor by which the period length is multiplied after each restart. Default is 1 (period length never changes).

  • lr_min (float, default: 0.0 ) –

    Minimum learning rate. Default is 0.

  • lr_max (float | None, default: None ) –

    Peak learning rate at the start of each period. If None, the optimizer's current learning rate is used. If provided, the optimizer's learning rate is set to this value immediately.

Methods

Method Description
.step() Update the learning rate following a cosine annealing schedule.

ReduceLROnPlateauLR

ReduceLROnPlateauLR

Bases: Scheduler

Reduce learning rate when a metric has stopped improving.

After each call to :meth:step, this scheduler compares the provided metric against the best observed value. When the metric has not improved for patience consecutive steps, the learning rate is reduced by factor. This allows the optimizer to escape plateaus and continue converging.

Methods

Method Description
.step() Check the monitored metric and reduce the learning rate if on a plateau.