This vignette and all its examples assume you are familiar with the Matrix package. If you are not, perhaps you should be! It provides an organised and optimised set of methods for dealing with both dense and sparse matrices.
Suppose the n × n matrix W is given by W = A−1 + UB−1V, where A, U, B and V are n × n, n × p, p × p and p × n matrices, respectively. Then W−1 = A − AUO−1VA, where O = B + VAU (a p × p matrix), and |W| = |O||A|−1|B|−1. These two relationships are often called the Woodbury matrix identity and the matrix determinant lemma, respectively. When A and B are both very sparse, or A is sparse (or even diagonal) and p is much smaller than n, manipulating the matrices via these relationships rather than forming W directly can lead to massive computational savings. A symmetric special case is given by W = A−1 + XB−1X′, where A and B must now be symmetric, and X is an n × p matrix.
This package provides a class, WoodburyMatrix
, to help
in using these identities. WoodburyMatrix
is a subclass the
Matrix::Matrix
class, and has two subtypes,
GWoodburyMatrix
for general matrices, and
SWoodburyMatrix
for symmetric matrices. A
WoodburyMatrix
can be formed by providing the matrices
A and B (with U and V defaulting to identity matrices)
to the WoodburyMatrix(...)
function:
library(Matrix)
library(WoodburyMatrix)
set.seed(20200721)
A1 <- rsparsematrix(500, 500, 0.05)
B1 <- rsparsematrix(500, 500, 0.05)
W1 <- WoodburyMatrix(A1, B1)
The matrices A
, B
, U
and
V
are cast to Matrix::Matrix
, so they can be
dense or sparse, diagonal, triagonal, or any of the many subtypes
available in the Matrix package. The symmetric form can be used with
symmetric = TRUE
, with the new restriction that
A
and B
must be symmetric:
A2 <- crossprod(rsparsematrix(500, 500, 0.05))
B2 <- crossprod(rsparsematrix(500, 500, 0.05))
W2 <- WoodburyMatrix(A2, B2, symmetric = TRUE)
Some attempt is also made at automatic symmetric detection (see the
WoodburyMatrix
function documentation), but it is always
best to explicitly provide symmetric = TRUE
if you know the
matrices are symmetric.
The following methods are defined for all WoodburyMatrix
objects:
instantiate(W)
: forms the implicitly defined matrix
W
. This is usually not a good idea, because the matrix can
be very large and this operation very slow;t(W)
: transpose of the matrix, returning another
WoodburyMatrix
;solve(W)
: returns the inverse of W
- not
recommended in general because this will usually be dense;solve(W, b)
: solves the system Wx = b
for
x
; can be much faster than doing this directly for large,
sparse A and B;determinant(W)
: returns the log determinant of
W
, again much faster than a direct method;W %*% x
: the matrix product of W
and
x
, where x
can be any object for which
solve
is defined for submatrices A
and
B
(most typically, a numeric vector);Matrix::Matrix
also work, including dim
, dimnames
,
nrow
, ncol
, crossprod
, and
tcrossprod
.SWoodburyMatrix
objects additionally have the following
methods:
mahalanobis(x, center, cov)
: where cov
is
a SWoodburyMatrix
; calculate the Mahalanobis distance;dwnorm(x, mean, covariance, log)
: calculates the (log-)
density of a multivariate normal distribution with an
SWoodburyMatrix
object as covariance;rwnorm(n, mean, covariance)
: draws samples from a
multivariate normal distribution with an SWoodburyMatrix
object as covariance.A very common example in which this technique applies is Bayesian linear regression, when $$ \begin{aligned} (Y \mid \beta) & \sim N(X\beta, \sigma^2 I_n), \\ \beta & \sim N(\mu_\beta, Q^{-1}_\beta), \end{aligned} $$ where Y is n × 1, X is n × p matrix of covariates, and β is a p × 1 vector of regression coefficients. Integrating out β yields $$ \begin{aligned} Y & \sim N(X\mu_\beta, \Sigma_y), \\ \Sigma_Y & = \sigma^2 I_n + X Q^{-1}_\beta X' \end{aligned} $$ with density $$ p(Y) = (2\pi)^{-n/2} |\Sigma_Y|^{-1/2} \exp\left\{ -\frac{1}{2} (Y - X\mu_\beta)' \Sigma^{-1}_Y (Y - X\mu_\beta) \right\}. $$ For most X and Qβ, the covariance ΣY is very dense, so for large n using this expression is infeasible. But the expression for ΣY can be mapped to the Woodbury matrix identity via A → σ−2In, B → Qβ and X → X, which facilitates computation of the density by requiring inversion only of the p × p matrix O.
Suppose n = 1000, p = 20, σ2 = 1, Qβ = Ip,
μβ = 0
and X is composed of random
entries. We can code this example and form a WoodburyMatrix
for ΣY
using:
n <- 2000
p <- 20
sigma_squared <- 1
X <- matrix(rnorm(n * p), nrow = n)
Q_beta <- Diagonal(p)
mu_beta <- rep(0, p)
Sigma_Y <- WoodburyMatrix(
Diagonal(n, 1 / sigma_squared),
Q_beta,
X = X
)
Now compare the size of Sigma_Y
versus directly forming
this matrix (which can be done using the instantiate(.)
function):
Sigma_Y_direct <- instantiate(Sigma_Y)
object.size(Sigma_Y)
#> 329952 bytes
object.size(instantiate(Sigma_Y))
#> 32001400 bytes
The WoodburyMatrix
object is much smaller. You can draw
samples from this distribution using the function rwnorm
,
and calculate its log density using dwnorm
:
Y <- rwnorm(1, X %*% mu_beta, Sigma_Y)
system.time(print(dwnorm(Y, X %*% mu_beta, Sigma_Y, log = TRUE)))
#> [1] -2902.92
#> user system elapsed
#> 0.008 0.015 0.006
Compare the time this takes to the direct version:
Y_m <- Y - X %*% mu_beta
system.time(print(as.numeric(
- 0.5 * n * log(2 * pi)
- 0.5 * determinant(Sigma_Y_direct)$modulus
- 0.5 * crossprod(Y_m, solve(Sigma_Y_direct, Y_m))
)))
#> [1] -2902.92
#> user system elapsed
#> 0.193 0.137 0.083
For large n the direct version would be completely infeasible.
A related example, also from statistics, involves a sum of two other
vectors having multivariate normal distributions, each with sparse
precision matrices: $$
\begin{aligned}
Y & = Y_1 + Y_2, \\
Y_1 & \sim N(0, Q_1^{-1}), \\
Y_2 & \sim N(0, Q_2^{-1}),
\end{aligned}
$$ where Y, Y1 and Y2 are n × 1, and Q1 and Q2 are sparse n × n positive-definite
matrices. Marginally, the distribution of Y is Y ∼ N(0, Q1−1 + Q2−1),
which has density p(Y) ∝ |Q1−1 + Q2−1|−1/2exp {Y′(Q1−1 + Q2−1)−1)Y}.
Unfortunately, the matrix Q1−1 + Q2−1
is unlikely to be sparse, so evaluating the density directly is not
feasible for large n. By
mapping A → Q1, B → Q2, and
X → In,
the problem can be solved with a WoodburyMatrix
.
A very simple example that fits in this framework occurs in time series analysis. Suppose that Y1 = (Y1, 1, …, Y1, n)′ is an AR(1) process in time, and Y2 is i.i.d. normal with variance σ2. That is, Y1, t = ρY1, t − 1 + ϵt with ρ ∈ (−1, 1) and ϵt ∼ N(0, 1). It can be shown that the precision matrix for Y1 is given by the tridiagonal matrix $$ Q_1 = \begin{pmatrix} 1 & -\rho & & & & \\ -\rho & 1 + \rho^2 & -\rho & & & \\ & -\rho & 1 + \rho^2 & -\rho & & \\ & & \ddots & \ddots & \ddots & \\ & & & -\rho & 1 + \rho^2 & -\rho \\ & & & & -\rho & 1 \end{pmatrix}, $$ which is very sparse for large n. Of course, the precision matrix for Y2, Q2 = σ−2In, is also sparse. For n = 10000, ρ = 0.95, σ2 = 1 this can be coded as
n <- 10000
rho <- 0.95
Q_1 <- bandSparse(
n,
diagonals = list(
c(1, rep(1 + rho ^ 2, n - 2), 1),
rep(-rho, n - 1)
),
k = c(0, 1),
symmetric = TRUE
)
Q_2 <- Diagonal(n)
Sigma_Y <- WoodburyMatrix(Q_1, Q_2)
Incidentally, the automatic symmetry detection worked here, so we
didn’t have to specify symmetric = TRUE
:
We can draw a sample of Y, and plot it:
And finally we can calculate the log density of the sample very efficiently:
A more general form allows for multiple B matrices: $$
W = A^{-1} + \sum_{i = 1}^k U_i B_i^{-1} V_i,
$$ and analogously for the symmetric form. You can use this form
by providing a list of matrices as the B
(or
U
, V
or X
) arguments. Internally,
this is implemented by converting to the standard form by letting
B = bdiag(...the B matrices...)
,
U = cbind(..the U matrices...)
, and so on. Here is an
example involving two B
matrices of varying sizes:
n <- 500
p1 <- 50
p2 <- 10
A <- rsparsematrix(n, n, 0.5)
B1 <- rsparsematrix(p1, p1, 0.9)
B2 <- rsparsematrix(p2, p2, 0.9)
U1 <- rsparsematrix(n, p1, 0.9)
U2 <- rsparsematrix(n, p2, 0.9)
W <- WoodburyMatrix(A, list(B1, B2), U = list(U1, U2), V = list(t(U1), t(U2)))
b <- rnorm(n)
str(solve(W, b))
#> num [1:500] 99.8 -103.1 -129.5 137.7 98.5 ...
For both general and symmetric matrices, heavy implicit use is made
of the fact that the Matrix package caches matrix factorisations. This
means that, generally speaking, the heavy lifting of decomposing the
matrices is done on the first operation like solve
or
determinant
, and subsequent calls are much faster.
Max A. Woodbury, Inverting modified matrices, Memorandum Rept. 42, Statistical Research Group, Princeton University, Princeton, NJ, 1950, 4pp
See also the Wikipedia pages for the Woodbury matrix identity and the Matrix determinant lemma.