Adam
simplegrad.optimizers.adam.Adam
Bases: Optimizer
Adam optimizer with bias-corrected moment estimates.
Update rule::
m_t = beta_1 * m_{t-1} + (1 - beta_1) * grad
v_t = beta_2 * v_{t-1} + (1 - beta_2) * grad^2
m_hat = m_t / (1 - beta_1^t)
v_hat = v_t / (1 - beta_2^t)
param -= lr * m_hat / (sqrt(v_hat) + eps)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
The model whose parameters to optimize. |
required |
lr
|
float
|
Learning rate. |
required |
beta_1
|
float
|
Exponential decay for the first moment. Defaults to 0.9. |
0.9
|
beta_2
|
float
|
Exponential decay for the second moment. Defaults to 0.999. |
0.999
|
eps
|
float
|
Numerical stability constant. Defaults to 1e-8. |
1e-08
|
Source code in simplegrad/optimizers/adam.py
step()
Apply one Adam update step to all model parameters.
Raises:
| Type | Description |
|---|---|
ValueError
|
If any parameter gradient is None (forgot to call backward). |