Embedding
Embedding maps integer token indices to dense floating-point vectors, acting as a learnable lookup table. It stores a weight matrix of shape (num_embeddings, embedding_dim) and indexes into it during the forward pass. This is the standard first layer for NLP models that consume tokenised text.
import simplegrad as sg
import simplegrad.nn as nn
embed = nn.Embedding(num_embeddings=1000, embedding_dim=64)
token_ids = sg.Tensor([4, 17, 3, 99]) # integer indices
out = embed(token_ids) # shape: (4, 64)
Embedding
Bases: Module
Lookup table that maps integer indices to dense vectors.
Weights are initialized from N(0, 1) by default.
Parameters:
-
num_embeddings(int) –Size of the vocabulary (number of rows in the embedding table).
-
embedding_dim(int) –Dimensionality of each embedding vector.
-
weight(Tensor | None, default:None) –Optional pre-built embedding matrix of shape
(num_embeddings, embedding_dim). -
dtype(str | None, default:None) –Data type string. Defaults to
"float32".
Attributes
| Attribute | Type | Description |
|---|---|---|
.weight |
Tensor |
Embedding matrix of shape (num_embeddings, embedding_dim). Learnable. |
.num_embeddings |
int |
Size of the vocabulary (total number of embeddings). |
.embedding_dim |
int |
Dimensionality of each embedding vector. |
Methods
| Method | Description |
|---|---|
.forward() |
Look up embeddings for the given integer token indices. |
Inherits all methods from Module: .parameters(), .submodules(), .to_device(), .summary(), .set_train_mode(), .set_eval_mode().