Apply one Adam update step to all parameters across all groups.
Uses the lr, beta_1, beta_2, eps, and maximize
stored in each parameter group, so different groups may use different
hyperparameters.
Raises:
-
ValueError
–
If any parameter gradient is None (forgot to call backward).