Skip to content

Embedding

Embedding maps integer token indices to dense floating-point vectors, acting as a learnable lookup table. It stores a weight matrix of shape (num_embeddings, embedding_dim) and indexes into it during the forward pass. This is the standard first layer for NLP models that consume tokenised text.

import simplegrad as sg
import simplegrad.nn as nn

embed = nn.Embedding(num_embeddings=1000, embedding_dim=64)
token_ids = sg.Tensor([4, 17, 3, 99])   # integer indices
out = embed(token_ids)                   # shape: (4, 64)

Embedding

Bases: Module

Lookup table that maps integer indices to dense vectors.

Weights are initialized from N(0, 1) by default.

Parameters:

  • num_embeddings (int) –

    Size of the vocabulary (number of rows in the embedding table).

  • embedding_dim (int) –

    Dimensionality of each embedding vector.

  • weight (Tensor | None, default: None ) –

    Optional pre-built embedding matrix of shape (num_embeddings, embedding_dim).

  • dtype (str | None, default: None ) –

    Data type string. Defaults to "float32".

Attributes

Attribute Type Description
.weight Tensor Embedding matrix of shape (num_embeddings, embedding_dim). Learnable.
.num_embeddings int Size of the vocabulary (total number of embeddings).
.embedding_dim int Dimensionality of each embedding vector.

Methods

Method Description
.forward() Look up embeddings for the given integer token indices.

Inherits all methods from Module: .parameters(), .submodules(), .to_device(), .summary(), .set_train_mode(), .set_eval_mode().