The next part of this course will cover numerical linear algebra.
Recall that an eigenvalue of a square matrix
The matrix
Decomposing a matrix into its eigenvectors and eigenvalues is called eigendecomposition.
While eigendecomposition only applies to square matrices, we can
extend the idea to rectangular matrices with a related tool called
singular value decomposition. But first, let’s review
eigendecomposition. If a square matrix
That is,
This implies that, for any vector
To see this, we can write
We have the same property for the Frobenius norm of a matrix. For any
square matrix
Singular value decomposition is one of the most fundamental results
in linear algebra. Without loss of generality, suppose that
Singular values are unique but factors are not. We would still get a
valid decomposition if we multiply the
For the eigendecomposition, we showed that
These properties are not true for rectangular matrices. Let
Similarly, for any
Multiplying a vector by a matrix
Multiplying a vector by a matrix
So we always have that
An important takeaway from singular value decomposition is how to
view matrix multiplication. We can view multiplying any vector
Rotate and/or reflect the vector (multiplication by
Scale the coordinates (multiplication by
Rotate and/or reflect the vector again (multiplication by
We can see this because
Let’s compare singular value decomposition to eigendecomposition.
Singular value decomposition exists for all matrices (square or rectangular) while eigendecomposition only exists for some square matrices.
Singular values are always positive while eigenvalues can be positive or negative.
The factors
We can connect singular value decomposition with eigendecomposition
by considering the matrix
There are many applications of SVD including:
Computing the pseudoinverse
Reading off the condition number of
Computing matrix norms like
Computing the matrix square root
Performing principle component analysis.
We’ll focus on a particularly useful application called low-rank
approximations of
Low-rank approximations are very useful when our data has some
structure. For example, if a dataset only has
We can exploit low-rank structure by using low-rank approximations to reduce the dimensionality of the data or visualizing the data in a lower dimensional space. Examples include data embeddings like word2vec or node2vec, reduced order modeling for solving physical equations, constructing preconditioners in optimization, and noisy triangulation.
Because low-rank approximations are so useful, we would like to know
how to find the best low-rank approximation to a matrix
Let
We will repeatedly use two observations:
We can write
Then
Then we choose
Claim (Orthogonal Projection): Consider any
orthogonal matrix
We will prove the orthogonal claim later. Using the claim, observe
that the problem of finding the best rank
Now that we know the SVD of a diagonal matrix is the best rank-
We will show the orthogonal claim from the observation that the
column
The orthogonal projection claim combined with our characterization of
the optimal low rank approximation gives us
We can see this because
The characterization of our low-rank approximation error in terms of the singular values gives a sense of how low-rank a matrix is. Data with structure will have a small number of large singular values and a large number of small singular values.
In contrast, data with no structure will have singular values that are all roughly the same size.
Now that we know the best low-rank approximation is the truncated SVD, all that remains is to find the SVD.
We can find the SVD with the following approach:
Compute
Find eigendecomposition of
Finally, compute
The total time complexity is
We can save time by computing an approximate solution to the SVD. In
particular, we will only compute the top
Krylov subspace methods like the Lanczos method are most commonly used in practice.
Power method is the simplest Krylov subspace method and still works very well.
We will focus on the power method next time.