Schedulers
Learning rate schedulers adjust the optimizer's learning rate over the course of training. Simplegrad provides function-based schedules (LinearLR, ExponentialLR, CosineAnnealingLR) that change the rate according to a fixed formula, and a metric-based schedule (ReduceLROnPlateauLR) that reduces the rate when a monitored metric stops improving. Call scheduler.step() once per epoch after the optimizer step.
import simplegrad as sg
import simplegrad.nn as nn
import simplegrad.optimizers as optim
import simplegrad.schedulers as schedulers
model = nn.Linear(64, 10)
opt = optim.Adam(lr=1e-2, model=model)
scheduler = schedulers.CosineAnnealingLR(opt, T_max=50)
for epoch in range(50):
# ... training loop ...
scheduler.step()
All schedulers inherit from Scheduler.
LinearLR
LinearLR
Bases: Scheduler
Methods
| Method | Description |
|---|---|
.step() |
Linearly interpolate the learning rate for the current step. |
ExponentialLR
ExponentialLR
Bases: Scheduler
Decays the learning rate by a multiplicative factor each step.
Computes the learning rate as
lr = start_lr * gamma^steps
Any three of start_lr, end_lr, total_steps, gamma can be provided to fully define the schedule. Alternatively, only start_lr and gamma can be provided for an infinite decay.
Parameters:
-
optimizer(Optimizer) –The optimizer whose learning rate should be scheduled.
-
start_lr(float | None, default:None) –Initial learning rate.
-
end_lr(float | None, default:None) –Final learning rate after total_steps.
-
total_steps(int | None, default:None) –Number of steps over which to decay.
-
gamma(float | None, default:None) –Multiplicative factor applied each step.
Methods
| Method | Description |
|---|---|
.step() |
Multiply the learning rate by gamma each step. |
CosineAnnealingLR
CosineAnnealingLR
Bases: Scheduler
Sets the learning rate using cosine annealing with warm restarts.
The learning rate follows
lr = lr_min + 0.5 * (lr_max - lr_min) * (1 + cos(pi * t_cur / T_i))
where t_cur is the number of steps since the last restart and T_i is the length of the current period. After each period expires, t_cur resets to 0 and T_i is multiplied by T_mult.
Parameters:
-
optimizer(Optimizer) –The optimizer whose learning rate should be scheduled.
-
T_0(int) –Number of steps in the first restart period.
-
T_mult(int, default:1) –Factor by which the period length is multiplied after each restart. Default is 1 (period length never changes).
-
lr_min(float, default:0.0) –Minimum learning rate. Default is 0.
-
lr_max(float | None, default:None) –Peak learning rate at the start of each period. If None, the optimizer's current learning rate is used. If provided, the optimizer's learning rate is set to this value immediately.
Methods
| Method | Description |
|---|---|
.step() |
Update the learning rate following a cosine annealing schedule. |
ReduceLROnPlateauLR
ReduceLROnPlateauLR
Bases: Scheduler
Reduce learning rate when a metric has stopped improving.
After each call to :meth:step, this scheduler compares the provided metric
against the best observed value. When the metric has not improved for
patience consecutive steps, the learning rate is reduced by factor.
This allows the optimizer to escape plateaus and continue converging.
Methods
| Method | Description |
|---|---|
.step() |
Check the monitored metric and reduce the learning rate if on a plateau. |