A post by Yuqi Zhang, PhD student on the Compass programme.

Introduction

Multiway data arise in many fields, from macroeconomics to neuroscience. In many modern applications, observations are naturally represented as higher-order arrays (tensors) rather than vectors or matrices.

While principal component analysis (PCA) and singular value decomposition (SVD) are powerful tools for analysing matrices, they are insufficient for such multi-dimensional data because they cannot capture interactions across multiple modes. A common but naive approach is to vectorise or matricise the data; however, this destroys the inherent array structure and ignores important dependencies present in the data.

Tensor methods are specifically designed to exploit and preserve the multi-dimensional relationships within data arrays. By working directly with tensors, we are able to uncover latent patterns that are not accessible via standard matrix-based techniques.

From SVD and PCA to Factor Models

Given observed data $\{X_t\}_{t=1}^T \subset \mathbb{R}^{p}$ (assumed mean-centred), principal component analysis (PCA) seeks the main directions of variation by studying the sample covariance matrix,
\[
\widehat{\Sigma} = \frac{1}{T}\sum_{t=1}^T X_tX_t^\intercal,
\]
and extracting its largest eigenvalues and eigenvectors. The SVD underpins this approach, but statistical interest focuses on modelling the population second moment,
\[
\Sigma = \mathbb{E}(X_tX_t^\intercal).
\]

The factor model assumes each observation is a linear combination of a small number of latent factors, plus idiosyncratic noise:
\[
X_t = \Lambda F_t + E_t,\quad \text{or }\; X_{it} = \sum_{j=1}^r \Lambda_{ij} F_{jt} + e_{it},
\]
where $\Lambda$ is the loading matrix, $F_t$ is the vector of latent factors, and $E_t$ is idiosyncratic noise. The covariance structure is then
\[
\Sigma = \Lambda \Sigma_F \Lambda^\intercal + \Sigma_E,
\]
where $\Sigma_F$ and $\Sigma_E$ are the covariances of the factors and the idiosyncratic components. Standard assumptions are: $r$ is small (so $\Sigma$ is approximately low rank); the loadings are pervasive ($\Lambda^\intercal \Lambda / p \to \Sigma_\Lambda$ is positive definite); and $E_t$ is uncorrelated with $F_t$, with $\Sigma_E$ usually diagonal or weakly dependent. Under these conditions, PCA can consistently estimate the factor structure.

However, classical vector or matrix factor models are only suited to data with two modes (e.g.\ variable $\times$ time or variable $\times$ individual). They cannot capture the higher-order dependencies and interactions present in modern multiway datasets, such as country $\times$ variable $\times$ time panels. This limitation motivates the development and use of \textit{tensor} methods, which explicitly retain and model the multi-dimensional array structure.

What is a Tensor?

Tensors extend familiar data structures such as vectors (order 1) and matrices (order 2) to arrays of three or more modes. For example, macroeconomic indicators measured across countries and variables over time form a third-order tensor, see the Figure below.

Figure 1. A real example of matrix-variate observations: macroeconomic variables for several countries over different quarters. Each slice is a country-variable matrix; stacking these over time yields a third-order tensor. Adapted from [4].

Tensors provide a natural framework for representing complex data in many fields, including colour images (height $\times$ width $\times$ channel), genomics (gene $\times$ patient $\times$ time), and recommender systems (user $\times$ item $\times$ context). Figure 2 shows this hierarchy.

Figure 2. Tensors generalise familiar objects: vectors (order 1), matrices (order 2), and higher-order arrays (order 3 or more). Adapted from [3].

Each entry in a tensor is indexed by one coordinate per mode, preserving multi-dimensional relationships. Figure 3 shows how each entry in a third-order tensor is specified by three indices.

Figure 3. A third-order tensor: (left) the overall cube, (middle) all its entries, and (right) how each entry is indexed by three coordinates $(i,j,k)$. Adapted from [3].

Flattening tensors into vectors or matrices is possible, but typically destroys the underlying structure and dependencies. Analysing tensors directly preserves multi-dimensional patterns.

Tensor decomposition

Tensor decompositions help represent high-dimensional data in a compact, interpretable form. Two of the most important decompositions are CP (CANDECOMP/PARAFAC) and Tucker decompositions.

CP (CANDECOMP/PARAFAC) decomposition. The CP decomposition expresses a tensor as a sum of rank-one components:
\[
\mathcal{X} \approx \sum_{j=1}^r \mathbf{a}_j^{(1)} \circ \mathbf{a}_j^{(2)} \circ \cdots \circ \mathbf{a}_j^{(N)},
\]
where each $\mathbf{a}_j^{(n)}$ is a vector in mode $n$, and $\circ$ is the outer product. CP decomposition gives a highly interpretable and often unique representation under mild conditions (see Figure 4).

Figure 4. CP (CANDECOMP/PARAFAC) decomposition of a third-order tensor: $\mathcal{X} \approx \sum_{j=1}^r \mathbf{p}_j^1 \circ \mathbf{p}_j^2 \circ \mathbf{p}_j^3$, where each term is a rank-one tensor formed by the outer product of vectors along each mode. Adapted from [1].

Tucker decomposition. Tucker decomposition generalises the matrix SVD to higher-order tensors by factorising the data into a core tensor and a set of factor matrices:
\[
\mathcal{X} \approx \mathcal{G} \times_1 A^{(1)} \times_2 A^{(2)} \cdots \times_N A^{(N)},
\]
where $\mathcal{G}$ is the core tensor and $A^{(n)}$ are the loading matrices for each mode. Tucker allows different numbers of components per mode and a flexible core tensor (see Figure 5).

Figure 5. Tucker decomposition for a third-order tensor $\mathcal{X}$: $\mathcal{C} \in \mathbb{R}^{m_1 \times m_2 \times m_3}$ is the core tensor, and $Q^{k} \in \mathbb{R}^{n_k \times m_k}$ ($k = 1, 2, 3$) are the loadings along each mode. Adapted from [1].

The CP decomposition is a special case of the Tucker decomposition, where the core tensor $\mathcal{G}$ is \textit{super-diagonal}\footnote{A super-diagonal tensor is nonzero only when all mode indices are equal, i.e., $g_{i_1,\dots,i_K} \neq 0$ only if $i_1 = \dots = i_K$.} and the number of components is the same along each mode. CP is unique and interpretable, but may be unstable for high-rank or noisy data. Tucker is more flexible, permitting different ranks for each mode and a full core tensor, which is useful for modelling real data.

Algorithms and extensions. Popular algorithms for tensor decompositions include alternating least squares (ALS), higher-order SVD (HOSVD), and structured variants for nonnegative, sparse, or smooth decompositions. Unlike the matrix SVD, tensor decompositions usually need iterative algorithms and are sensitive to initialisation and rank selection. For a full review, see [2].

Tensor Factor Models

The tensor factor model extends classical factor analysis to multiway data, building on the Tucker and CP decompositions. For observed data $\{\mathcal{X}_t\} \subset \mathbb{R}^{p_1 \times \ldots \times p_K}$, the model is
\[
\mathcal{X}_t = \mathcal{G}_t \times_1 \Lambda_1 \times_2 \Lambda_2 \cdots \times_K \Lambda_K + \mathcal{E}_t,
\]
where $\mathcal{G}_t$ is a low-dimensional core tensor, $\Lambda_k$ are mode-specific loading matrices, and $\mathcal{E}_t$ is idiosyncratic noise. This structure matches the Tucker decomposition, with CP recovered as a special case when the core tensor is super-diagonal.

This framework generalises the matrix factor model — where a matrix is expressed as low-rank signal plus noise — to higher-order arrays. Typical assumptions are: the signal has low multilinear rank (most variation is explained by a few factors per mode); each loading matrix is full rank and well-conditioned (ensuring identifiability and pervasiveness); and the noise tensor is mean zero, weakly dependent, and uncorrelated with the factors.

Tensor factor models are widely used for high-dimensional multiway data, such as neuroimaging, recommender systems, macroeconomic panels, and genomics. They capture shared patterns and interactions across multiple modes, offering more flexible and interpretable dimension reduction than matrix models. See [1,2] for reviews.

Their popularity comes from theoretical flexibility, practical effectiveness, and their close link to tensor decompositions, which allows use of established tools for statistical inference.

Applications and Outlook

Tensor methods are increasingly central to statistics, data science, and machine learning. In neuroscience, they reveal interpretable spatio-temporal patterns. In recommender systems, they capture user-item-context interactions. In macroeconomics, they help extract trends and shocks in high-dimensional panels. The link to latent factor models supports applications in clustering, anomaly detection, and other multivariate analyses.

Current research focuses on scalable algorithms, robust inference procedures, and methods that use domain constraints such as sparsity or smoothness. As multiway data become more common, tensor decompositions offer a principled and versatile toolkit for statistical modelling and analysis.

References

[1] Xuan Bi, Xiwei Tang, Yubai Yuan, Yanqing Zhang, and Annie Qu. Tensors in statistics. Annual review of statistics and its application, 8(1):345–368, 2021.

[2] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.

[3] Tatsuya Yokota. Very basics of tensors with graphical notations: unfolding, calculations, and decompositions. arXiv preprint arXiv:2411.16094, 2024.

[4] Long Yu, Yong He, Xinbing Kong, and Xinsheng Zhang. Projected estimation for large-dimensional matrix factor models. Journal of Econometrics, 229(1):201–217, 2022.

Author: Yuqi Zhang

Student Perspectives: Tensors, Decomposed: Seeing Beyond Matrices