Apply one SGD update step to all parameters across all groups.
Uses the lr, momentum, and dampening stored in each parameter group,
so different groups may use different hyperparameters.
Raises:
-
ValueError
–
If any parameter gradient is None (forgot to call backward).