Pooling

max_pool2d down-samples a spatial feature map by taking the maximum value within each non-overlapping window. This reduces spatial resolution while preserving the strongest activations, making the representation more compact and translation-invariant. During backpropagation, the gradient is routed only to the position that held the maximum in the forward pass.

import simplegrad as sg

x = sg.normal((1, 16, 28, 28), requires_grad=True)
out = sg.max_pool2d(x, kernel_size=2, stride=2)
# out.shape == (1, 16, 14, 14)

`max_pool2d(x: Tensor, kernel_size: int | tuple[int, int], stride: int | tuple[int, int] = None, pad_width: int | tuple[int, int, int] = 0, pad_mode: str = 'constant', pad_value: int = 0) -> Tensor`

Apply 2D max pooling over the input tensor.

Parameters:

x (Tensor) –

Input tensor of shape (batch, channels, H, W) or (channels, H, W).
kernel_size (int | tuple[int, int]) –

Pooling window size. Int or (kH, kW).
stride (int | tuple[int, int], default: None ) –

Step between pooling windows. Int or (sH, sW). Defaults to kernel_size if not specified.
pad_width (int | tuple[int, int, int], default: 0 ) –

Padding before pooling. Int (all sides) or (top, bottom, left, right).
pad_mode (str, default: 'constant' ) –

Padding mode. Defaults to "constant".
pad_value (int, default: 0 ) –

Fill value for constant padding. Defaults to 0.