Activation Layers

Activation layers are Module wrappers around the functional activation ops, making it convenient to compose them inside Sequential or custom Module subclasses. Each layer holds no trainable parameters — it simply calls the corresponding function from simplegrad.functions.activations in its forward method.

import simplegrad as sg
import simplegrad.nn as nn

model = nn.Sequential(
    nn.Linear(16, 32),
    nn.ReLU(),
    nn.Linear(32, 10),
)
out = model(sg.ones((4, 16)))

All activation layers inherit from Module and share its methods (.parameters(), .to_device(), .set_train_mode(), etc.).

ReLU

`ReLU`

Bases: Module

ReLU activation layer: max(0, x).

Method	Description
`.forward()`	Apply `max(0, x)` element-wise.

ELU

`ELU`

Bases: Module

ELU (Exponential Linear Unit) activation layer.

Applies elu(x, alpha) element-wise. See :func:~simplegrad.functions.activations.elu for the full definition.

Parameters:

alpha (float, default: 1.0 ) –

Saturation slope for the negative region. Defaults to 1.0.

Attributes

Attribute	Type	Description
`.alpha`	`float`	Scale for the negative saturation region. Defaults to `1.0`.

Methods

Method	Description
`.forward()`	Apply ELU activation element-wise.

Tanh

`Tanh`

Bases: Module

Tanh activation layer: tanh(x).

Method	Description
`.forward()`	Apply `tanh(x)` element-wise.

Sigmoid

`Sigmoid`

Bases: Module

Sigmoid activation layer: 1 / (1 + exp(-x)).

Method	Description
`.forward()`	Apply `1 / (1 + exp(-x))` element-wise.

GELU

`GELU`

Bases: Module

GELU (Gaussian Error Linear Unit) activation layer.

Applies gelu(x, mode) element-wise. See :func:~simplegrad.functions.activations.gelu for the full definition and the difference between modes.

Parameters:

mode (str, default: 'erf' ) –

Computation mode — "erf" (exact, default) or "tanh" (approximation).

Attributes

Attribute	Type	Description
`.mode`	`str`	Approximation mode: `"erf"` (exact) or `"tanh"` (fast). Defaults to `"erf"`.

Methods

Method	Description
`.forward()`	Apply GELU activation element-wise.

Softmax

`Softmax`

Bases: Module

Softmax activation layer.

Parameters:

dim (int | None, default: None ) –

Dimension to normalize over. Defaults to None (all elements).

Attributes

Attribute	Type	Description
`.dim`	`int \\| None`	Axis along which softmax is computed.

Methods

Method	Description
`.forward()`	Apply softmax along the configured axis.