torch_openreml.utils.categorical_to_design_matrix¶
- torch_openreml.utils.categorical_to_design_matrix(x, levels=None, drop_first=False, dtype=None, device=None)[source]¶
Construct a one-hot encoded design matrix from a categorical vector.
Each unique level in
xbecomes one column in the output matrix. The column ordering followslevelsif provided, otherwise the sorted unique values ofx.- Parameters:
x (list or tuple of str) – Categorical vector of string labels.
levels (list of str, optional) – Explicit level ordering. Must contain exactly the same unique values as
x. Defaults to the sorted unique values ofx.drop_first (bool, optional) – Whether to drop the first column to avoid multicollinearity. Defaults to
False.dtype (torch.dtype, optional) – Desired dtype of the matrix. Defaults to the PyTorch default dtype.
device (torch.device, optional) – Desired device of the matrix. Defaults to the PyTorch default device.
- Returns:
One-hot encoded matrix of shape
(len(x), len(levels))or(len(x), len(levels) - 1)ifdrop_first=True.- Return type:
torch.Tensor
- Raises:
ValueError – If
levelsdoes not match the number of unique values inx.
Example:
import torch from torch_openreml.utils import categorical_to_design_matrix print(categorical_to_design_matrix(["a", "b", "a", "c"])) print(categorical_to_design_matrix(["a", "b", "a", "c"], drop_first=True))
tensor([[1., 0., 0.], [0., 1., 0.], [1., 0., 0.], [0., 0., 1.]]) tensor([[0., 0.], [1., 0.], [0., 0.], [0., 1.]])