Introduction to REML Estimation¶

The REML class implements restricted maximum likelihood (REML) estimation for linear mixed models using the average information (AI) algorithm. The AI algorithm combines the stability of the expected information matrix with the curvature information of the observed information matrix, leading to efficient and robust estimation of variance components.

This vignette introduces the REML model formulation, explains how to construct covariance models, demonstrates optimisation workflows, and covers post-estimation utilities such as BLUEs, BLUPs, predictions, residuals, and convergence diagnostics.

The model¶

The REML class assumes the linear mixed model

\[\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\mathbf{b} + \boldsymbol{\varepsilon},\]

where

\(\mathbf{y}\) is the response vector of length \(n\),
\(\mathbf{X}\) is the fixed-effects design matrix,
\(\boldsymbol{\beta}\) is the vector of fixed effects,
\(\mathbf{Z}\) is the random-effects design matrix,
\(\mathbf{b}\) is the vector of random effects,
\(\boldsymbol{\varepsilon}\) is the residual error vector.

The random effects and residuals are assumed independent with

\[\mathbf{b} \sim \mathcal{N}(\mathbf{0}, \mathbf{G}), \qquad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{R}).\]

The marginal covariance of \(\mathbf{y}\) is therefore

\[\mathbf{V}(\boldsymbol{\theta}) = \mathbf{Z}\mathbf{G}(\boldsymbol{\theta})\mathbf{Z}^\top + \mathbf{R}(\boldsymbol{\theta}),\]

where \(\boldsymbol{\theta}\) denotes the variance-component parameters.

The REML estimator maximises the restricted log-likelihood

\[\ell_R(\boldsymbol{\theta}) = -\frac{1}{2} \Bigl( \log |\mathbf{V}| + \log |\mathbf{X}^\top \mathbf{V}^{-1} \mathbf{X}| + \mathbf{y}^\top \mathbf{P} \mathbf{y} \Bigr),\]

where

\[\mathbf{P} = \mathbf{V}^{-1} - \mathbf{V}^{-1}\mathbf{X} \left( \mathbf{X}^\top \mathbf{V}^{-1}\mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{V}^{-1}\]

is the residual projection matrix.

Important

Internally, REML repeatedly evaluates

the covariance matrix \(\mathbf{V}\),
its derivatives \(\partial \mathbf{V} / \partial \theta_k\),
the score vector,
and the average information matrix,

until convergence criteria are satisfied.

Constructing the REML object¶

The REML constructor supports multiple ways to define the covariance structure. The appropriate approach depends on the complexity of the model and the desired level of control.

The most important constructor arguments are:

map_theta_to_v: Callable mapping \(\boldsymbol{\theta}\) to the marginal covariance matrix \(\mathbf{V}\).
map_theta_to_dv: Callable returning the derivatives \(\partial\mathbf{V}/\partial\theta_k\).
v_builder: A covariance Matrix object representing the covariance structure.
map_theta_to_g: Optional mapping from parameters to the random-effects covariance matrix \(\mathbf{G}\). Required for BLUPs and conditional predictions.
mask_theta_to_g: Boolean mask selecting which parameters are passed to map_theta_to_g.

Exactly one of map_theta_to_v or v_builder must be supplied.

Via `map_theta_to_v` only¶

The minimal interface requires only a function mapping parameters to the marginal covariance matrix.

import torch
from torch_openreml import REML

n = 50

def map_theta_to_v(theta):

    sigma2 = theta[0].exp()
    rho = torch.sigmoid(theta[1])

    I = torch.eye(n)
    J = torch.ones(n, n)

    return sigma2 * (I + rho * J)

reml = REML(map_theta_to_v=map_theta_to_v)

In this example:

theta[0] controls the variance scale,
theta[1] controls a correlation parameter,
exponential and sigmoid transforms enforce parameter constraints.

The covariance matrix is

\[\mathbf{V} = \sigma^2 (\mathbf{I} + \rho \mathbf{J}),\]

where \(\mathbf{J}\) is the all-ones matrix.

When only map_theta_to_v is supplied, REML computes derivatives automatically using torch.func.jacrev().

This approach is convenient for prototyping, but automatic differentiation can become expensive for large covariance matrices. At the same time, it offers essentially unrestricted flexibility: the marginal covariance can be constructed through arbitrary differentiable PyTorch operations, ranging from simple parameterizations to highly sophisticated matrix algebra, iterative procedures, decompositions, simulation-based constructions, dynamically assembled covariance components, or even neural networks that directly output or parameterize covariance structure.

Because the covariance construction is defined directly in Python/PyTorch code rather than through a fixed covariance specification, users are free to incorporate conditional branching, stochastic generation, adaptive logic, external modules, or even entirely different covariance structures across optimization iterations (although such behavior is usually not statistically meaningful in practice). In effect, any covariance model that can be expressed as a differentiable computational graph in PyTorch can be used within this framework.

Via `map_theta_to_v` and `map_theta_to_dv`¶

For better performance, analytical derivatives can be supplied explicitly.

import torch
from torch_openreml import REML

n = 50

def map_theta_to_v(theta):

    sigma2 = theta[0].exp()
    rho = torch.sigmoid(theta[1])

    I = torch.eye(n)
    J = torch.ones(n, n)

    return sigma2 * (I + rho * J)

def map_theta_to_dv(theta):

    sigma2 = theta[0].exp()
    rho = torch.sigmoid(theta[1])

    dsigma2 = sigma2
    drho = rho * (1 - rho)

    I = torch.eye(n)
    J = torch.ones(n, n)

    dV_dtheta0 = dsigma2 * (I + rho * J)
    dV_dtheta1 = sigma2 * drho * J

    return torch.stack([dV_dtheta0, dV_dtheta1])

reml = REML(
    map_theta_to_v=map_theta_to_v,
    map_theta_to_dv=map_theta_to_dv,
)

The derivative function must return a tensor of shape (num_free_params, n, n), where slice k corresponds to

\[\frac{\partial \mathbf{V}} {\partial \theta_k}.\]

Providing analytical derivatives is strongly recommended for large models because derivative evaluation is typically one of the most computationally expensive parts of REML optimisation.

Via a covariance Matrix builder¶

For realistic mixed models, the recommended workflow is to construct the covariance structure using the torch_openreml.covariance matrix system.

The covariance builder API provides reusable covariance components such as:

along with composition operators including:

Example:

import torch

from torch_openreml import REML
from torch_openreml.covariance import (
    DummyMatrix,
    ScalarMatrix,
    CovariancePropagation,
    Sum,
)

n, p = 50, 2

y = torch.randn(n)
X = torch.randn(n, p)

Z = DummyMatrix(["a", "b"] * 25)()

V = Sum(
    CovariancePropagation(
        Z,
        ScalarMatrix(2),
    ),
    ScalarMatrix(n),
)

reml = REML(v_builder=V)

The builder automatically manages:

parameter bookkeeping,
parameter transforms,
covariance assembly,
covariance derivatives.

Internally, REML calls

map_theta_to_v()
map_theta_to_dv()

provided by the builder.

The derivative calculation uses grad(), which attempts a closed-form manual_grad() implementation first and falls back to automatic differentiation when necessary.

Providing `map_theta_to_g`¶

Some methods require access to the random-effects covariance matrix \(\mathbf{G}\) separately from the marginal covariance matrix \(\mathbf{V}\).

These include:

The map_theta_to_g argument may be:

a callable,
a covariance Matrix,
or None.

Example:

G = ScalarMatrix(2)

reml = REML(
    v_builder=V,
    map_theta_to_g=G,
    mask_theta_to_g=torch.tensor([True, False]),
)

The mask specifies which elements of \(\boldsymbol{\theta}\) are passed into G.

This is useful when the full parameter vector contains both random-effect and residual variance parameters.

Running the optimiser¶

Once the REML object is constructed, optimisation is performed using optimize().

import torch

from torch_openreml import REML
from torch_openreml.covariance import (
    DummyMatrix,
    ScalarMatrix,
    CovariancePropagation,
    Sum,
)

n, p = 50, 2

y = torch.randn(n)
X = torch.randn(n, p)

Z = DummyMatrix(["a", "b"] * 25)()

V = Sum(
    CovariancePropagation(
        Z,
        ScalarMatrix(2),
    ),
    ScalarMatrix(n),
)

reml = REML(v_builder=V)

theta_start = torch.zeros(V.num_free_params)

theta_hat, beta_hat, n_iter = reml.optimize(
    y,
    X,
    theta_start,
    verbose=2,
)

∥∇∥:       9.8709, ∥Δ∥: 23.0035, η: 1.00, ∥Δᶜ∥: 23.0035, log 𝓛: -25.2022
∥∇∥:       0.7636, ∥Δ∥: 0.0078, η: 1.00, ∥Δᶜ∥: 0.0078, log 𝓛: -21.8203 (+3.3819)
∥∇∥:       0.0060, ∥Δ∥: 0.0001, η: 1.00, ∥Δᶜ∥: 0.0001, log 𝓛: -21.8173 (+0.0030)
∥∇∥:       0.0000, ∥Δ∥: 0.0000, η: 1.00, ∥Δᶜ∥: 0.0000, log 𝓛: -21.8173 (+0.0000)

[∇: score, Δ: 𝐉⁻¹∇, η: learning rate, Δᶜ: clip(𝛉 + ηΔ, lb, ub) - 𝛉, 𝓛: restricted likelihood]

✓ Converged at iteration 4

Inputs¶

y: Response vector of shape (n,).
X: Fixed-effects design matrix of shape (n, p).
theta_start: Initial parameter vector.

Choosing sensible starting values can substantially improve convergence, especially for complex covariance structures.

Returned values¶

theta_hat: Final variance-component estimates.
beta_hat: Estimated fixed effects evaluated at theta_hat.
n_iter: Number of completed optimisation iterations.

Optimisation controls¶

The optimisation routine supports several keyword arguments.

max_iter (default 200)

Maximum number of iterations.

eta (default 1.0)

Step-size multiplier applied to the AI update.

The parameter update is

\[\boldsymbol{\theta}^{(t+1)} = \boldsymbol{\theta}^{(t)} + \eta \, \mathbf{AI}^{-1}\mathbf{s}.\]

Smaller values of eta can improve stability when the optimisation oscillates or diverges.

lb and ub

Optional lower and upper parameter bounds applied after each update.

verbose (default 0)

Controls optimisation output.

0: silent
1: progress bar
2: detailed iteration diagnostics

Detailed diagnostics include:

score norm,
update norm,
log-likelihood,
learning rate,
iteration count.

Convergence criteria¶

At each iteration, convergence is evaluated using three criteria.

Score norm¶

The score vector must satisfy

\[\|\mathbf{s}\| < \texttt{tol\_score}.\]

This checks whether the gradient is close to zero.

Parameter update norm¶

The parameter update must satisfy

\[\|\Delta\| < \texttt{tol\_delta}.\]

This checks whether the parameters have stabilised.

Log-likelihood change¶

The restricted log-likelihood change must satisfy

\[|\ell_R^{(t)} - \ell_R^{(t-1)}| < \texttt{tol\_loglik}.\]

Default tolerances¶

The default tolerances are:

tol_score = 1e-4
tol_delta = 1e-4
tol_loglik = 1e-4

All enabled criteria must be satisfied simultaneously.

Criteria may be individually disabled using:

check_score=False
check_delta=False
check_loglik=False

At least two iterations are required before convergence can be declared.

Post-estimation methods¶

After optimisation, the fitted REML object provides several utilities for extracting estimates and predictions.

Retrieving parameter estimates¶

theta_last = reml.get_theta(select="last")
theta_best = reml.get_theta(select="best")

beta_last = reml.get_beta(select="last")
beta_best = reml.get_beta(select="best")

select="last": Returns the final optimisation iterate.
select="best": Returns the iterate with the highest restricted log-likelihood.

The "best" option is useful if optimisation temporarily overshoots or oscillates near convergence.

BLUE — fixed effects¶

The best linear unbiased estimator (BLUE) of the fixed effects is

\[\widehat{\boldsymbol{\beta}} = \left( \mathbf{X}^\top \mathbf{V}^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{V}^{-1} \mathbf{y}.\]

Compute it using:

beta_hat = reml.blue(y, X, theta_hat)

BLUP — random effects¶

The best linear unbiased predictor (BLUP) of the random effects is

\[\widehat{\mathbf{b}} = \mathbf{G} \mathbf{Z}^\top \mathbf{V}^{-1} \left( \mathbf{y} - \mathbf{X}\widehat{\boldsymbol{\beta}} \right).\]

Compute it using:

b_hat = reml.blup(
    y,
    X,
    Z,
    theta_hat,
    map_theta_to_g=ScalarMatrix(2),
    mask_theta_to_g=torch.tensor([True, False]),
)

This method requires map_theta_to_g.

Predictions¶

Marginal predictions use only fixed effects:

\[\widehat{\mathbf{y}}_{\mathrm{marginal}} = \mathbf{X}\widehat{\boldsymbol{\beta}}.\]

Conditional predictions additionally include random effects:

\[\widehat{\mathbf{y}}_{\mathrm{conditional}} = \mathbf{X}\widehat{\boldsymbol{\beta}} + \mathbf{Z}\widehat{\mathbf{b}}.\]

Example:

y_hat_marginal = reml.marginal_predict(
    y,
    X,
    theta_hat,
)

y_hat_conditional = reml.predict(
    y,
    X,
    Z,
    theta_hat,
    map_theta_to_g=ScalarMatrix(2),
    mask_theta_to_g=torch.tensor([True, False]),
)

Residuals¶

Marginal residuals:

\[\mathbf{e}_{\mathrm{marginal}} = \mathbf{y} - \widehat{\mathbf{y}}_{\mathrm{marginal}}.\]

Conditional residuals:

\[\mathbf{e}_{\mathrm{conditional}} = \mathbf{y} - \widehat{\mathbf{y}}_{\mathrm{conditional}}.\]

Example:

e_marginal = reml.marginal_residual(
    y,
    X,
    theta_hat,
)

e_conditional = reml.residual(
    y,
    X,
    Z,
    theta_hat,
    map_theta_to_g=ScalarMatrix(2),
    mask_theta_to_g=torch.tensor([True, False]),
)

Evaluating the log-likelihood¶

The REML log-likelihood can be evaluated directly without optimisation.

loglik = reml.loglik(y, X, theta_hat)

This is useful for:

debugging,
likelihood profiling,
model comparison,
monitoring optimisation trajectories.

Complete example¶

The following example demonstrates a mixed model with:

genotype random effects,
replicate-block random effects,
residual variance.

import torch

from torch_openreml import REML
from torch_openreml.utils import augment, n_distinct

from torch_openreml.covariance import (
    DummyMatrix,
    IdentityMatrix,
    ScalarMatrix,
    Sum,
    CovariancePropagation,
    KroneckerProduct,
)

from torch_openreml.example_data import john_alpha

# --- response ---
y = torch.tensor(john_alpha["yield"].values)

# --- fixed effects ---
X = augment(
    torch.ones(len(john_alpha), 1),
    DummyMatrix(john_alpha["rep"], drop_first=True)()
)

# --- random effect design matrices ---
Z_gen = DummyMatrix(john_alpha["gen"])
Z_rep_block = DummyMatrix(john_alpha["rep"], john_alpha["block"])

# --- covariance components ---
G_gen = ScalarMatrix(n_distinct(john_alpha["gen"]))
G_rep = IdentityMatrix(n_distinct(john_alpha["rep"]))
G_block = ScalarMatrix(n_distinct(john_alpha["block"]))

R = ScalarMatrix(len(john_alpha))

# --- marginal covariance ---
V = Sum(
    CovariancePropagation(Z_gen, G_gen),
    CovariancePropagation(
        Z_rep_block,
        KroneckerProduct(G_rep, G_block)
    ),
    R
)

# --- REML fit ---
reml = REML(v_builder=V)

theta_start = torch.zeros(V.num_free_params)

theta_hat, beta_hat, n_iter = reml.optimize(
    y,
    X,
    theta_start,
    verbose=2,
)

# --- results ---
print("theta:", theta_hat)

print("variance components:", V.build_params(theta_hat))

print("fixed effects:", beta_hat)

print("loglik:", reml.loglik(y, X, theta_hat))

∥∇∥:      41.8197, ∥Δ∥: 8.9438, η: 1.00, ∥Δᶜ∥: 8.9438, log 𝓛: -34.3129
∥∇∥:  315227.4375, ∥Δ∥: 0.8838, η: 1.00, ∥Δᶜ∥: 0.8838, log 𝓛: -201185.3906 (-201151.0777)
∥∇∥:  119229.9453, ∥Δ∥: 0.8520, η: 1.00, ∥Δᶜ∥: 0.8520, log 𝓛: -71844.4531 (+129340.9375)
∥∇∥:   44273.4570, ∥Δ∥: 0.8143, η: 1.00, ∥Δᶜ∥: 0.8143, log 𝓛: -26065.4551 (+45778.9980)
∥∇∥:   16330.5117, ∥Δ∥: 0.7587, η: 1.00, ∥Δᶜ∥: 0.7587, log 𝓛: -9457.1396 (+16608.3154)
∥∇∥:    6001.4102, ∥Δ∥: 0.7142, η: 1.00, ∥Δᶜ∥: 0.7142, log 𝓛: -3392.8696 (+6064.2700)
∥∇∥:    2194.8989, ∥Δ∥: 0.7031, η: 1.00, ∥Δᶜ∥: 0.7031, log 𝓛: -1181.2805 (+2211.5891)
∥∇∥:     793.7851, ∥Δ∥: 0.6952, η: 1.00, ∥Δᶜ∥: 0.6952, log 𝓛: -382.4908 (+798.7897)
∥∇∥:     278.6875, ∥Δ∥: 0.6715, η: 1.00, ∥Δᶜ∥: 0.6715, log 𝓛: -102.2992 (+280.1916)
∥∇∥:      90.4116, ∥Δ∥: 0.5957, η: 1.00, ∥Δᶜ∥: 0.5957, log 𝓛: -11.4274 (+90.8718)
∥∇∥:      23.9216, ∥Δ∥: 0.4006, η: 1.00, ∥Δᶜ∥: 0.4006, log 𝓛:  12.6792 (+24.1066)
∥∇∥:       3.8674, ∥Δ∥: 0.1392, η: 1.00, ∥Δᶜ∥: 0.1392, log 𝓛:  16.5908 (+3.9116)
∥∇∥:       0.2448, ∥Δ∥: 0.0157, η: 1.00, ∥Δᶜ∥: 0.0157, log 𝓛:  16.8078 (+0.2170)
∥∇∥:       0.0104, ∥Δ∥: 0.0007, η: 1.00, ∥Δᶜ∥: 0.0007, log 𝓛:  16.8099 (+0.0021)
∥∇∥:       0.0004, ∥Δ∥: 0.0000, η: 1.00, ∥Δᶜ∥: 0.0000, log 𝓛:  16.8100 (+0.0001)
∥∇∥:       0.0000, ∥Δ∥: 0.0000, η: 1.00, ∥Δᶜ∥: 0.0000, log 𝓛:  16.8099 (-0.0001)

[∇: score, Δ: 𝐉⁻¹∇, η: learning rate, Δᶜ: clip(𝛉 + ηΔ, lb, ub) - 𝛉, 𝓛: restricted likelihood]

✓ Converged at iteration 16
theta: tensor([-0.9728, -1.3281, -1.2529])
variance components: tensor([0.1429, 0.0702, 0.0816])
fixed effects: tensor([ 4.5183,  0.2978, -0.4140])
loglik: tensor(16.8098)

Optimisation history¶

After optimisation, the full iteration history is stored in reml.history.

This dictionary contains per-iteration records including:

theta
beta
loglik
score
ai
delta
update

This history is useful for:

diagnosing convergence problems,
inspecting optimisation trajectories,
plotting likelihood curves,
monitoring parameter stability.

Example:

scores = [
    torch.norm(s).item()
    for s in reml.history["score"]
]

logliks = [
    ll.item()
    for ll in reml.history["loglik"]
]

print(
    "Score norms:",
    [f"{s:.6f}" for s in scores],
)

print(
    "Log-likelihoods:",
    [f"{ll:.4f}" for ll in logliks],
)

Score norms: ['41.819736', '315227.437500', '119229.945312', '44273.457031', '16330.511719', '6001.410156', '2194.898926', '793.785095', '278.687469', '90.411598', '23.921585', '3.867368', '0.244757', '0.010371', '0.000418', '0.000021']
Log-likelihoods: ['-34.3129', '-201185.3906', '-71844.4531', '-26065.4551', '-9457.1396', '-3392.8696', '-1181.2805', '-382.4908', '-102.2992', '-11.4274', '12.6792', '16.5908', '16.8078', '16.8099', '16.8100', '16.8099']

Introduction to REML Estimation¶

The model¶

Constructing the REML object¶

Via map_theta_to_v only¶

Via map_theta_to_v and map_theta_to_dv¶

Via a covariance Matrix builder¶

Providing map_theta_to_g¶

Running the optimiser¶

Inputs¶

Returned values¶

Optimisation controls¶

Convergence criteria¶

Score norm¶

Parameter update norm¶

Log-likelihood change¶

Default tolerances¶

Post-estimation methods¶

Retrieving parameter estimates¶

BLUE — fixed effects¶

BLUP — random effects¶

Predictions¶

Residuals¶

Evaluating the log-likelihood¶

Complete example¶

Optimisation history¶

Via `map_theta_to_v` only¶

Via `map_theta_to_v` and `map_theta_to_dv`¶

Providing `map_theta_to_g`¶