Introduction to REML Estimation¶
The REML class implements restricted maximum
likelihood (REML) estimation for linear mixed models using the average
information (AI) algorithm. The AI algorithm combines the stability of
the expected information matrix with the curvature information of the
observed information matrix, leading to efficient and robust estimation
of variance components.
This vignette introduces the REML model formulation, explains how to construct covariance models, demonstrates optimisation workflows, and covers post-estimation utilities such as BLUEs, BLUPs, predictions, residuals, and convergence diagnostics.
The model¶
The REML class assumes the linear mixed model
where
\(\mathbf{y}\) is the response vector of length \(n\),
\(\mathbf{X}\) is the fixed-effects design matrix,
\(\boldsymbol{\beta}\) is the vector of fixed effects,
\(\mathbf{Z}\) is the random-effects design matrix,
\(\mathbf{b}\) is the vector of random effects,
\(\boldsymbol{\varepsilon}\) is the residual error vector.
The random effects and residuals are assumed independent with
The marginal covariance of \(\mathbf{y}\) is therefore
where \(\boldsymbol{\theta}\) denotes the variance-component parameters.
The REML estimator maximises the restricted log-likelihood
where
is the residual projection matrix.
Important
Internally, REML repeatedly evaluates
the covariance matrix \(\mathbf{V}\),
its derivatives \(\partial \mathbf{V} / \partial \theta_k\),
the score vector,
and the average information matrix,
until convergence criteria are satisfied.
Constructing the REML object¶
The REML constructor supports multiple ways to
define the covariance structure. The appropriate approach depends on the
complexity of the model and the desired level of control.
The most important constructor arguments are:
map_theta_to_vCallable mapping \(\boldsymbol{\theta}\) to the marginal covariance matrix \(\mathbf{V}\).
map_theta_to_dvCallable returning the derivatives \(\partial\mathbf{V}/\partial\theta_k\).
v_builderA covariance
Matrixobject representing the covariance structure.map_theta_to_gOptional mapping from parameters to the random-effects covariance matrix \(\mathbf{G}\). Required for BLUPs and conditional predictions.
mask_theta_to_gBoolean mask selecting which parameters are passed to
map_theta_to_g.
Exactly one of map_theta_to_v or v_builder must be supplied.
Via map_theta_to_v only¶
The minimal interface requires only a function mapping parameters to the marginal covariance matrix.
import torch
from torch_openreml import REML
n = 50
def map_theta_to_v(theta):
sigma2 = theta[0].exp()
rho = torch.sigmoid(theta[1])
I = torch.eye(n)
J = torch.ones(n, n)
return sigma2 * (I + rho * J)
reml = REML(map_theta_to_v=map_theta_to_v)
In this example:
theta[0]controls the variance scale,theta[1]controls a correlation parameter,exponential and sigmoid transforms enforce parameter constraints.
The covariance matrix is
where \(\mathbf{J}\) is the all-ones matrix.
When only map_theta_to_v is supplied,
REML computes derivatives automatically using
torch.func.jacrev().
This approach is convenient for prototyping, but automatic differentiation can become expensive for large covariance matrices. At the same time, it offers essentially unrestricted flexibility: the marginal covariance can be constructed through arbitrary differentiable PyTorch operations, ranging from simple parameterizations to highly sophisticated matrix algebra, iterative procedures, decompositions, simulation-based constructions, dynamically assembled covariance components, or even neural networks that directly output or parameterize covariance structure.
Because the covariance construction is defined directly in Python/PyTorch code rather than through a fixed covariance specification, users are free to incorporate conditional branching, stochastic generation, adaptive logic, external modules, or even entirely different covariance structures across optimization iterations (although such behavior is usually not statistically meaningful in practice). In effect, any covariance model that can be expressed as a differentiable computational graph in PyTorch can be used within this framework.
Via map_theta_to_v and map_theta_to_dv¶
For better performance, analytical derivatives can be supplied explicitly.
import torch
from torch_openreml import REML
n = 50
def map_theta_to_v(theta):
sigma2 = theta[0].exp()
rho = torch.sigmoid(theta[1])
I = torch.eye(n)
J = torch.ones(n, n)
return sigma2 * (I + rho * J)
def map_theta_to_dv(theta):
sigma2 = theta[0].exp()
rho = torch.sigmoid(theta[1])
dsigma2 = sigma2
drho = rho * (1 - rho)
I = torch.eye(n)
J = torch.ones(n, n)
dV_dtheta0 = dsigma2 * (I + rho * J)
dV_dtheta1 = sigma2 * drho * J
return torch.stack([dV_dtheta0, dV_dtheta1])
reml = REML(
map_theta_to_v=map_theta_to_v,
map_theta_to_dv=map_theta_to_dv,
)
The derivative function must return a tensor of shape
(num_free_params, n, n), where slice k corresponds to
Providing analytical derivatives is strongly recommended for large models because derivative evaluation is typically one of the most computationally expensive parts of REML optimisation.
Via a covariance Matrix builder¶
For realistic mixed models, the recommended workflow is to construct the
covariance structure using the
torch_openreml.covariance matrix system.
The covariance builder API provides reusable covariance components such as:
along with composition operators including:
Example:
import torch
from torch_openreml import REML
from torch_openreml.covariance import (
DummyMatrix,
ScalarMatrix,
CovariancePropagation,
Sum,
)
n, p = 50, 2
y = torch.randn(n)
X = torch.randn(n, p)
Z = DummyMatrix(["a", "b"] * 25)()
V = Sum(
CovariancePropagation(
Z,
ScalarMatrix(2),
),
ScalarMatrix(n),
)
reml = REML(v_builder=V)
The builder automatically manages:
parameter bookkeeping,
parameter transforms,
covariance assembly,
covariance derivatives.
Internally, REML calls
map_theta_to_v()map_theta_to_dv()
provided by the builder.
The derivative calculation uses
grad(),
which attempts a closed-form
manual_grad()
implementation first and falls back to automatic differentiation when
necessary.
Providing map_theta_to_g¶
Some methods require access to the random-effects covariance matrix \(\mathbf{G}\) separately from the marginal covariance matrix \(\mathbf{V}\).
These include:
The map_theta_to_g argument may be:
a callable,
a covariance
Matrix,or
None.
Example:
G = ScalarMatrix(2)
reml = REML(
v_builder=V,
map_theta_to_g=G,
mask_theta_to_g=torch.tensor([True, False]),
)
The mask specifies which elements of
\(\boldsymbol{\theta}\) are passed into G.
This is useful when the full parameter vector contains both random-effect and residual variance parameters.
Running the optimiser¶
Once the REML object is constructed, optimisation is performed using
optimize().
import torch
from torch_openreml import REML
from torch_openreml.covariance import (
DummyMatrix,
ScalarMatrix,
CovariancePropagation,
Sum,
)
n, p = 50, 2
y = torch.randn(n)
X = torch.randn(n, p)
Z = DummyMatrix(["a", "b"] * 25)()
V = Sum(
CovariancePropagation(
Z,
ScalarMatrix(2),
),
ScalarMatrix(n),
)
reml = REML(v_builder=V)
theta_start = torch.zeros(V.num_free_params)
theta_hat, beta_hat, n_iter = reml.optimize(
y,
X,
theta_start,
verbose=2,
)
∥∇∥: 9.8709, ∥Δ∥: 23.0035, η: 1.00, ∥Δᶜ∥: 23.0035, log 𝓛: -25.2022
∥∇∥: 0.7636, ∥Δ∥: 0.0078, η: 1.00, ∥Δᶜ∥: 0.0078, log 𝓛: -21.8203 (+3.3819)
∥∇∥: 0.0060, ∥Δ∥: 0.0001, η: 1.00, ∥Δᶜ∥: 0.0001, log 𝓛: -21.8173 (+0.0030)
∥∇∥: 0.0000, ∥Δ∥: 0.0000, η: 1.00, ∥Δᶜ∥: 0.0000, log 𝓛: -21.8173 (+0.0000)
[∇: score, Δ: 𝐉⁻¹∇, η: learning rate, Δᶜ: clip(𝛉 + ηΔ, lb, ub) - 𝛉, 𝓛: restricted likelihood]
✓ Converged at iteration 4
Inputs¶
yResponse vector of shape
(n,).XFixed-effects design matrix of shape
(n, p).theta_startInitial parameter vector.
Choosing sensible starting values can substantially improve convergence, especially for complex covariance structures.
Returned values¶
theta_hatFinal variance-component estimates.
beta_hatEstimated fixed effects evaluated at
theta_hat.n_iterNumber of completed optimisation iterations.
Optimisation controls¶
The optimisation routine supports several keyword arguments.
max_iter(default200)Maximum number of iterations.
eta(default1.0)Step-size multiplier applied to the AI update.
The parameter update is
\[\boldsymbol{\theta}^{(t+1)} = \boldsymbol{\theta}^{(t)} + \eta \, \mathbf{AI}^{-1}\mathbf{s}.\]Smaller values of
etacan improve stability when the optimisation oscillates or diverges.lbandubOptional lower and upper parameter bounds applied after each update.
verbose(default0)Controls optimisation output.
0: silent1: progress bar2: detailed iteration diagnostics
Detailed diagnostics include:
score norm,
update norm,
log-likelihood,
learning rate,
iteration count.
Convergence criteria¶
At each iteration, convergence is evaluated using three criteria.
Score norm¶
The score vector must satisfy
This checks whether the gradient is close to zero.
Parameter update norm¶
The parameter update must satisfy
This checks whether the parameters have stabilised.
Log-likelihood change¶
The restricted log-likelihood change must satisfy
Default tolerances¶
The default tolerances are:
tol_score = 1e-4tol_delta = 1e-4tol_loglik = 1e-4
All enabled criteria must be satisfied simultaneously.
Criteria may be individually disabled using:
check_score=Falsecheck_delta=Falsecheck_loglik=False
At least two iterations are required before convergence can be declared.
Post-estimation methods¶
After optimisation, the fitted REML object provides several utilities for extracting estimates and predictions.
Retrieving parameter estimates¶
theta_last = reml.get_theta(select="last")
theta_best = reml.get_theta(select="best")
beta_last = reml.get_beta(select="last")
beta_best = reml.get_beta(select="best")
select="last"Returns the final optimisation iterate.
select="best"Returns the iterate with the highest restricted log-likelihood.
The "best" option is useful if optimisation temporarily overshoots
or oscillates near convergence.
BLUE — fixed effects¶
The best linear unbiased estimator (BLUE) of the fixed effects is
Compute it using:
beta_hat = reml.blue(y, X, theta_hat)
BLUP — random effects¶
The best linear unbiased predictor (BLUP) of the random effects is
Compute it using:
b_hat = reml.blup(
y,
X,
Z,
theta_hat,
map_theta_to_g=ScalarMatrix(2),
mask_theta_to_g=torch.tensor([True, False]),
)
This method requires map_theta_to_g.
Predictions¶
Marginal predictions use only fixed effects:
Conditional predictions additionally include random effects:
Example:
y_hat_marginal = reml.marginal_predict(
y,
X,
theta_hat,
)
y_hat_conditional = reml.predict(
y,
X,
Z,
theta_hat,
map_theta_to_g=ScalarMatrix(2),
mask_theta_to_g=torch.tensor([True, False]),
)
Residuals¶
Marginal residuals:
Conditional residuals:
Example:
e_marginal = reml.marginal_residual(
y,
X,
theta_hat,
)
e_conditional = reml.residual(
y,
X,
Z,
theta_hat,
map_theta_to_g=ScalarMatrix(2),
mask_theta_to_g=torch.tensor([True, False]),
)
Evaluating the log-likelihood¶
The REML log-likelihood can be evaluated directly without optimisation.
loglik = reml.loglik(y, X, theta_hat)
This is useful for:
debugging,
likelihood profiling,
model comparison,
monitoring optimisation trajectories.
Complete example¶
The following example demonstrates a mixed model with:
genotype random effects,
replicate-block random effects,
residual variance.
import torch
from torch_openreml import REML
from torch_openreml.utils import augment, n_distinct
from torch_openreml.covariance import (
DummyMatrix,
IdentityMatrix,
ScalarMatrix,
Sum,
CovariancePropagation,
KroneckerProduct,
)
from torch_openreml.example_data import john_alpha
# --- response ---
y = torch.tensor(john_alpha["yield"].values)
# --- fixed effects ---
X = augment(
torch.ones(len(john_alpha), 1),
DummyMatrix(john_alpha["rep"], drop_first=True)()
)
# --- random effect design matrices ---
Z_gen = DummyMatrix(john_alpha["gen"])
Z_rep_block = DummyMatrix(john_alpha["rep"], john_alpha["block"])
# --- covariance components ---
G_gen = ScalarMatrix(n_distinct(john_alpha["gen"]))
G_rep = IdentityMatrix(n_distinct(john_alpha["rep"]))
G_block = ScalarMatrix(n_distinct(john_alpha["block"]))
R = ScalarMatrix(len(john_alpha))
# --- marginal covariance ---
V = Sum(
CovariancePropagation(Z_gen, G_gen),
CovariancePropagation(
Z_rep_block,
KroneckerProduct(G_rep, G_block)
),
R
)
# --- REML fit ---
reml = REML(v_builder=V)
theta_start = torch.zeros(V.num_free_params)
theta_hat, beta_hat, n_iter = reml.optimize(
y,
X,
theta_start,
verbose=2,
)
# --- results ---
print("theta:", theta_hat)
print("variance components:", V.build_params(theta_hat))
print("fixed effects:", beta_hat)
print("loglik:", reml.loglik(y, X, theta_hat))
∥∇∥: 41.8197, ∥Δ∥: 8.9438, η: 1.00, ∥Δᶜ∥: 8.9438, log 𝓛: -34.3129
∥∇∥: 315227.4375, ∥Δ∥: 0.8838, η: 1.00, ∥Δᶜ∥: 0.8838, log 𝓛: -201185.3906 (-201151.0777)
∥∇∥: 119229.9453, ∥Δ∥: 0.8520, η: 1.00, ∥Δᶜ∥: 0.8520, log 𝓛: -71844.4531 (+129340.9375)
∥∇∥: 44273.4570, ∥Δ∥: 0.8143, η: 1.00, ∥Δᶜ∥: 0.8143, log 𝓛: -26065.4551 (+45778.9980)
∥∇∥: 16330.5117, ∥Δ∥: 0.7587, η: 1.00, ∥Δᶜ∥: 0.7587, log 𝓛: -9457.1396 (+16608.3154)
∥∇∥: 6001.4102, ∥Δ∥: 0.7142, η: 1.00, ∥Δᶜ∥: 0.7142, log 𝓛: -3392.8696 (+6064.2700)
∥∇∥: 2194.8989, ∥Δ∥: 0.7031, η: 1.00, ∥Δᶜ∥: 0.7031, log 𝓛: -1181.2805 (+2211.5891)
∥∇∥: 793.7851, ∥Δ∥: 0.6952, η: 1.00, ∥Δᶜ∥: 0.6952, log 𝓛: -382.4908 (+798.7897)
∥∇∥: 278.6875, ∥Δ∥: 0.6715, η: 1.00, ∥Δᶜ∥: 0.6715, log 𝓛: -102.2992 (+280.1916)
∥∇∥: 90.4116, ∥Δ∥: 0.5957, η: 1.00, ∥Δᶜ∥: 0.5957, log 𝓛: -11.4274 (+90.8718)
∥∇∥: 23.9216, ∥Δ∥: 0.4006, η: 1.00, ∥Δᶜ∥: 0.4006, log 𝓛: 12.6792 (+24.1066)
∥∇∥: 3.8674, ∥Δ∥: 0.1392, η: 1.00, ∥Δᶜ∥: 0.1392, log 𝓛: 16.5908 (+3.9116)
∥∇∥: 0.2448, ∥Δ∥: 0.0157, η: 1.00, ∥Δᶜ∥: 0.0157, log 𝓛: 16.8078 (+0.2170)
∥∇∥: 0.0104, ∥Δ∥: 0.0007, η: 1.00, ∥Δᶜ∥: 0.0007, log 𝓛: 16.8099 (+0.0021)
∥∇∥: 0.0004, ∥Δ∥: 0.0000, η: 1.00, ∥Δᶜ∥: 0.0000, log 𝓛: 16.8100 (+0.0001)
∥∇∥: 0.0000, ∥Δ∥: 0.0000, η: 1.00, ∥Δᶜ∥: 0.0000, log 𝓛: 16.8099 (-0.0001)
[∇: score, Δ: 𝐉⁻¹∇, η: learning rate, Δᶜ: clip(𝛉 + ηΔ, lb, ub) - 𝛉, 𝓛: restricted likelihood]
✓ Converged at iteration 16
theta: tensor([-0.9728, -1.3281, -1.2529])
variance components: tensor([0.1429, 0.0702, 0.0816])
fixed effects: tensor([ 4.5183, 0.2978, -0.4140])
loglik: tensor(16.8098)
Optimisation history¶
After optimisation, the full iteration history is stored in
reml.history.
This dictionary contains per-iteration records including:
thetabetaloglikscoreaideltaupdate
This history is useful for:
diagnosing convergence problems,
inspecting optimisation trajectories,
plotting likelihood curves,
monitoring parameter stability.
Example:
scores = [
torch.norm(s).item()
for s in reml.history["score"]
]
logliks = [
ll.item()
for ll in reml.history["loglik"]
]
print(
"Score norms:",
[f"{s:.6f}" for s in scores],
)
print(
"Log-likelihoods:",
[f"{ll:.4f}" for ll in logliks],
)
Score norms: ['41.819736', '315227.437500', '119229.945312', '44273.457031', '16330.511719', '6001.410156', '2194.898926', '793.785095', '278.687469', '90.411598', '23.921585', '3.867368', '0.244757', '0.010371', '0.000418', '0.000021']
Log-likelihoods: ['-34.3129', '-201185.3906', '-71844.4531', '-26065.4551', '-9457.1396', '-3392.8696', '-1181.2805', '-382.4908', '-102.2992', '-11.4274', '12.6792', '16.5908', '16.8078', '16.8099', '16.8100', '16.8099']