Student Perspectives: Unconstrained Reparameterisation of Covariance Matrix Modelling via Modified Cholesky Decomposition

A post by Xinyue Guan, PhD student on the Compass programme.

Introduction

Effective management of renewable energy sources depends on accurate forecasts, particularly in a probabilistic format for decision-making under uncertainty. When dealing with multiple forecasts across different locations or times, it’s crucial to assess if errors will compound or offset each other. Let \boldsymbol{Y} = {y_1, y_2, \ldots, y_p} be a sequence of forecast errors with leading times l_1, l_2, \ldots, l_p, to understand these dependencies, we model the covariance matrix of the forecast errors.

One major difficulty in modelling the covariance matrix is guaranteeing the positive definiteness of the estimator, which is a matrix equivalent of positivity constraint for variance modelling. There are a few different unconstrained matrix reparameterisation available, such as Cholesky decomposition, spherical decomposition, matrix logarithm and etc. [2].

In this post, we will focus on the modified Cholesky decomposition and demonstrate how this decomposition allows us to reframe the complex problem of covariance modelling into a more intuitive series of linear regressions, revealing the underlying structure of forecast errors.

Modified Cholesky Decomposition

Instead of modelling the covariance matrix \Sigma directly, we model its inverse, the precision matrix \Sigma^{-1}. This allows us to reframe the complex problem of ensuring positive definiteness into a simpler, unconstrained estimation task.

The decomposition is expressed as:
\Sigma^{-1} = T^T D^{-2} T,

where T is a unit lower triangular matrix and D is a diagonal matrix.

A key advantage of the modified Cholesky decomposition is its direct and intuitive link to linear regression[3]. Consider the component of \boldsymbol{Y}, y_t, regressed on its predecessors y_1, \ldots, y_{t-1}:

y_t = \sum_{k=1}^{t-1} \phi_{tk} y_k + \epsilon_t, \quad t=1, \ldots, p.

The components of T and D have a direct statistical meaning in the context of this sequence of regressions:

1. The off-diagonal entries of the matrix T are the negative regression coefficients:

2. The diagonal entries are the variances of the residuals, \epsilon_t:
D^2=\text{diag}(\text{var}(\epsilon_1),\ldots,\text{var}(\epsilon_p))

By modelling the unconstrained entries of T and the logaritham of the diagonal entries of D^2 separately, the positive-definiteness of the final covariance matrix is automatically guaranteed. The logarithm transformation on entries of D simply ensures the positivity of these values.

Real-life Dataset and Sparsity of Modified Cholesky Factor

We analysed a wind power dataset from Scotland, which contains 351 daily observations. For each day, the data consists of 24 hourly forecast errors derived from a quantile regression model[1].

The figure below shows a heatmap of the sample covariance matrix for this dataset, visualizing the dependence structure between the hourly errors.

The left plot in figure above shows the matrix resulting from the modified Cholesky decomposition, computed by the mcd() function in the SCM R-package. In this matrix, the diagonal entries of \log(D^2) are combined with the off-diagonal entries of matrix T. To better visualize the structure of T, we can set the diagonal elements to zero as shown in the right plot, then we observe that only the entries on the first sub-diagonal are significantly different from zero.

The plot of diagonal entries of \log(D^2) in the figure above shows that the first entry is distinctly different from the others. This observation is directly explained by the linear regression interpretation of the modified Cholesky decomposition. The first entry, \log(d_1^2), represents the logarithm of the unconditional variance of the first variable y_1. In contrast, all subsequent entries, \log(d_h^2) for h>1, represent the logarithm of the conditional variances of the regression errors \epsilon_h, after accounting for all preceding variables y_1,\dots,y_{h-1}. Therefore, it is expected that the first term, which carries the full variance of the initial variable, would differ from the subsequent residual variances.

The figure above plots the last row of the matrix T, clearly shows that only the values closest to the main diagonal have significant non-zero values, with the remaining entries oscillate around zero. This observation aligns with linear regression interpretations. In a high-order autoregression, it is expected that the coefficients for the most recent variables will be the most significant, while the influence of distant past variables diminishes.

This emergent sparsity is a highly desirable feature which allows our model to be parsimonious, meaning it has far fewer parameters to estimate. This significantly reduces computational cost and, more importantly, lowers the risk of over-fitting the model, leading to better predictive performance.

Conclusion

The modified Cholesky decomposition is a powerful tool for covariance modelling for two key reasons. It not only solves the technical challenge of ensuring positive definiteness but, more importantly, its inherent link to linear regression provides a clear, interpretable insight into the data’s underlying structure.

Our analysis used this property to reveal a sparse, autoregressive dependency within wind power forecast errors, by uncovering this simple structure, the decomposition allows for the development of more parsimonious and robust statistical models that are easier to estimate and less prone to over-fitting.

References

[1] Gilbert, C. (2021). Topics in High-dimensional Energy Forecasting. Ph.d. dissertation, University of Strathclyde, Glasgow.
[2] Pinheiro, J. and Bates, D. (1996). Unconstrained parametrizations for variance-covariance matrices. Statistics and Computing, 6:289–296.
[3] Pourahmadi, M. (2011). Covariance Estimation: The GLM and Regularization Perspectives. Statistical Science, 26(3):369 – 387.

Skip to toolbar