torch_openreml.utils.categorical_to_design_matrix

torch_openreml.utils.categorical_to_design_matrix(x, levels=None, drop_first=False, dtype=None, device=None)[source]

Construct a one-hot encoded design matrix from a categorical vector.

Each unique level in x becomes one column in the output matrix. The column ordering follows levels if provided, otherwise the sorted unique values of x.

Parameters:
  • x (list or tuple of str) – Categorical vector of string labels.

  • levels (list of str, optional) – Explicit level ordering. Must contain exactly the same unique values as x. Defaults to the sorted unique values of x.

  • drop_first (bool, optional) – Whether to drop the first column to avoid multicollinearity. Defaults to False.

  • dtype (torch.dtype, optional) – Desired dtype of the matrix. Defaults to the PyTorch default dtype.

  • device (torch.device, optional) – Desired device of the matrix. Defaults to the PyTorch default device.

Returns:

One-hot encoded matrix of shape (len(x), len(levels)) or (len(x), len(levels) - 1) if drop_first=True.

Return type:

torch.Tensor

Raises:

ValueError – If levels does not match the number of unique values in x.

Example:

import torch
from torch_openreml.utils import categorical_to_design_matrix

print(categorical_to_design_matrix(["a", "b", "a", "c"]))

print(categorical_to_design_matrix(["a", "b", "a", "c"], drop_first=True))
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.]])
tensor([[0., 0.],
        [1., 0.],
        [0., 0.],
        [0., 1.]])