Embedding

Embedding maps integer token indices to dense floating-point vectors, acting as a learnable lookup table. It stores a weight matrix of shape (num_embeddings, embedding_dim) and indexes into it during the forward pass. This is the standard first layer for NLP models that consume tokenised text.

import simplegrad as sg
import simplegrad.nn as nn

embed = nn.Embedding(num_embeddings=1000, embedding_dim=64)
token_ids = sg.Tensor([4, 17, 3, 99])   # integer indices
out = embed(token_ids)                   # shape: (4, 64)

`Embedding`

Bases: Module

Lookup table that maps integer indices to dense vectors.

Weights are initialized from N(0, 1) by default.

Parameters:

num_embeddings (int) –

Size of the vocabulary (number of rows in the embedding table).
embedding_dim (int) –

Dimensionality of each embedding vector.
weight (Tensor | None, default: None ) –

Optional pre-built embedding matrix of shape (num_embeddings, embedding_dim).
dtype (str | None, default: None ) –

Data type string. Defaults to "float32".

Attributes

Attribute	Type	Description
`.weight`	`Tensor`	Embedding matrix of shape `(num_embeddings, embedding_dim)`. Learnable.
`.num_embeddings`	`int`	Size of the vocabulary (total number of embeddings).
`.embedding_dim`	`int`	Dimensionality of each embedding vector.

Methods

Method	Description
`.forward()`	Look up embeddings for the given integer token indices.

Inherits all methods from Module: .parameters(), .submodules(), .to_device(), .summary(), .set_train_mode(), .set_eval_mode().