Introduction

What are eigenvalues? What are singular values? They both describe the behavior of a matrix on a certain set of vectors. The difference is this: The eigenvectors of a matrix describe the directions of its invariant action. The singular vectors of a matrix describe the directions of its maximum action. And the corresponding eigen- and singular values describe the magnitude of that action.

They are defined this way. A scalar \(\lambda\) is an eigenvalue of a linear transformation \(A\) if there is a vector \(v\) such that \(A v = \lambda v\), and \(v\) is called an eigenvector of \(\lambda\). A scalar \(\sigma\) is a singular value of \(A\) if there are (unit) vectors \(u\) and \(v\) such that \(A v = \sigma u\) and \(A^* u = \sigma v\), where \(A^*\) is the conjugate transpose of \(A\); the vectors \(u\) and \(v\) are singular vectors. The vector \(u\) is called a left singular vector and \(v\) a right singular vector.

Eigenvalues and Eigenvectors

That eigenvectors give the directions of invariant action is obvious from the definition. The definition says that when \(A\) acts on an eigenvector, it just multiplies it by a constant, the corresponding eigenvalue. In other words, when a linear transformation acts on one of its eigenvectors, it shrinks the vector or stretches it and reverses its direction if \(\lambda\) is negative, but never changes the direction otherwise. The action is invariant.

Take this matrix, for instance:

\[ A = \begin{bmatrix} 0 & 2 \\ 2 & 0 \end{bmatrix} \]

Eigenvectors of A
Eigenvectors of \(A\)

We can see how the transformation just stretches the red vector by a factor of 2, while the blue vector it stretches but also reflects over the origin.

And this matrix:

\[ A = \begin{bmatrix} 1 & \frac{1}{3} \\ \frac{4}{3} & 1 \end{bmatrix} \]

Eigenvectors of A
Eigenvectors of \(A\)

It stretches the red vector and shrinks the blue vector, but reverses neither.

The point is that in every case, when a matrix acts on one of its eigenvectors, the action is always in a parallel direction.

Singular Values and Singular Vectors

This invariant direction does not necessarily give the transformation’s direction of greatest effect, however. You can see that in the previous example. But say \(\sigma_1\) is the largest singular value of \(A\) with right singular vector \(v\). Then \(v\) is a solution to

\[ \operatorname*{argmax}_{x, ||x||=1} ||A x|| \]

In other words, \( ||A v|| = \sigma_1 \) is at least as big as \( ||A x|| \) for any other unit vector \(x\). It’s not necessarily the case that \(A v\) is parallel to \(v\), though.

Compare the eigenvectors of the matrix in the last example to its singular vectors:

Singular vectors of A
Singular vectors of \(A\)

The directions of maximum effect will be exactly the semi-axes of the ellipse, the ellipse which is the image of the unit circle under \(A\).

Let’s extend this idea to 3-dimensional space to get a better idea of what’s going on. Consider this transformation:

\[A = \begin{bmatrix} \frac{3}{2} \, \sqrt{2} & -\sqrt{2} & 0 \\ \frac{3}{2} \, \sqrt{2} & \sqrt{2} & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

This will have the effect of transforming the unit sphere into an ellipsoid:

The unit sphere transformed into an ellipsoid.
The unit sphere transformed into an ellipsoid.

Its singular values are 3, 2, and 1. You can see how they again form the semi-axes of the resulting figure.

The singular vectors as semi-axes in the ellipsoid.
The singular vectors as semi-axes in the ellipsoid.

Matrix Approximation with SVD

Now, the singular value decomposition (SVD) will tell us what \(A\)’s singular values are:

\[ A = U \Sigma V^* = \begin{bmatrix} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} & 0.0 \\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0.0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 3 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

The diagonal entries of the matrix \(\Sigma\) are the singular values of \(A\). We can obtain a lower-dimensional approximation to \(A\) by setting one or more of its singular values to 0.

For instance, say we set the largest singular value, 3, to 0. We then get this matrix:

\[ A_1 = \begin{bmatrix} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} & 0.0 \\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0.0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 0 & -\frac{\sqrt{2}}{2} & 0 \\ 0 & \frac{\sqrt{2}}{2} & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

which transforms the unit sphere like this:

The transformation with the largest singular value set to 0.
The transformation with the largest singular value set to 0.

The resulting figure now lives in a 2-dimensional space. Further, the largest singular value of \(A_1\) is now 2. Set it to 0:

\[ A_2 = \begin{bmatrix} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} & 0.0 \\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0.0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

And we get a 1-dimensional figure, and a final largest singular value of 1:

The transformation with the two largest singular values set to 0.
The transformation with the two largest singular values set to 0.

This is the point: Each set of singular vectors will form an orthonormal basis for some linear subspace of \(\mathbb{R}^n\). A singular value and its singular vectors give the direction of maximum action among all directions orthogonal to the singular vectors of any larger singular value.

This has important applications. There are many problems in statistics and machine learning that come down to finding a low-rank approximation to some matrix at hand. Principal component analysis is a problem of this kind. It says: approximate some matrix \(X\) of observations with a number of its uncorrelated components of maximum variance. This problem is solved by computing its singular value decomposition and setting some of its smallest singular values to 0.

Low-rank approximations of A.
Low-rank approximations of \(A\).

Introduction

Say \(V\) and \(W\) are vector spaces with scalars in some field \(\mathbb{F}\) (the real numbers, maybe). A linear map is a function \(T : V \rightarrow W \) satisfying two conditions:

  • additivity \(T(x + y) = T x + T y\) for all \(x, y \in V\)
  • homogeneity \(T(c x) = c (T x)\) for all \(c \in \mathbb{F} \) and all \(x \in V\)

Defining a linear map this way just ensures that anything that acts like a vector in \(V\) also acts like a vector in \(W\) after you map it over. It means that the map preserves all the structure of a vector space after it’s applied.

It’s a simple definition – which is good – but doesn’t speak much to the imagination. Since linear algebra is possibly the most useful and most ubiquitous of all the branches of mathematics, we’d like to have some intuition about what linear maps are so we have some idea of what we’re doing when we use it. Though not all vectors live there, the Euclidean plane \(\mathbb{R}^2\) is certainly the easiest to visualize, and the way we measure distance there is very similar to the way we measure error in statistics, so we can feel that our intuitions will carry over.

It turns out that all linear maps in \(\mathbb{R}^2\) can be factored into just a few primitive geometric operations: scaling, rotation, and reflection. This isn’t the only way to factor these maps, but I think it’s the easiest to understand. (We could get by without rotations, in fact.)

The unit circle, rotated, reflected, and scaled.
The unit circle, rotated, reflected, and scaled.

Three Primitive Transformations

Scaling

A (non-uniform) scaling transformation \(D\) in \(\mathbb{R}^2\) is given by a diagonal matrix:

\[Scl(d1, d2) = \begin{bmatrix} d_1 & 0 \\ 0 & d_2 \\ \end{bmatrix}\]

where \(d_1\) and \(d_2\) are non-negative. The transformation has the effect of stretching or shrinking a vector along each coordinate axis, and, so long as \(d_1\) and \(d_2\) are positive, it will also preserve the orientation of vectors after mapping because in this case \(\det(D) = d_1 d_2 > 0\).

For instance, here is the effect on a vector of this matrix: \[D = \begin{bmatrix} 0.75 & 0 \\ 0 & 1.25 \\ \end{bmatrix}\]

A vector, scaled.
A vector, scaled.

It will shrink a vector by a factor of 0.75 along the x-axis and stretch a vector by a factor of 1.25 along the y-axis.

If we think about all the vectors of length 1 as being the points of the unit circle, then we can get an idea of how the transformation will affect any vector. We can see a scaling as a continous transformation beginning at the identity matrix.

If one of the diagonal entries is 0, then it will collapse the circle on the other axis.

\[D = \begin{bmatrix} 0 & 0 \\ 0 & 1.25 \\ \end{bmatrix}\]

This is an example of a rank-deficient matrix. It maps every vector onto the y-axis, and so its image has a dimension less than the dimension of the full space.

Rotation

A rotation transformation \(Ref\) is given by a matrix: \[Ref(\theta) = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \\ \end{bmatrix}\]

This transformation will have the effect of rotating a vector counter-clockwise by an angle \(\theta\), when \(\theta\) is positive, and clockwise by \(\theta\) when \(\theta\) is negative.

A vector, rotated by 3\pi/4
A vector, rotated by \(3\pi/4\)

And the unit circle gets mapped onto itself.

It shouldn’t be too hard to convince ourselves that the matrix we’ve written down is the one we want. Take some unit vector and write its coordinates like \((\cos\gamma, \sin\gamma)\). Multiply it by \(Ref(\theta)\) to get \((\cos\gamma \cos\theta - \sin\gamma \sin\theta, \cos\gamma \sin\theta + \sin\gamma \cos\theta)\). But by a trigonometric identity, this is exactly the vector \((\cos(\gamma + \theta), \sin(\gamma + \theta))\), which is our vector rotated by \(\theta\).

A rotation should preserve not only orientations, but also distances. Now, recall that the determinant for a \(2\times 2\) matrix \(\begin{bmatrix} a & b \\ c & d \end{bmatrix}\) is \(a d - b c\). So a rotation matrix will have determinant \(\cos^2(\theta) + \sin^2(\theta)\), which, by the Pythagorean identity, is equal to 1. This, together with the fact that its columns are orthonormal means that it does preserve both. It is a kind of orthogonal matrix, which is a kind of isometry.

Reflection

A reflection in \(\mathbb{R}^2\) can be described with matricies like: \[Ref(\theta) = \begin{bmatrix} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \\ \end{bmatrix}\] where the reflection is through a line crossing the origin and forming an angle \(\theta\) with the x-axis.

A vector, reflected over a line at angle \pi/4.
A vector, reflected over a line at angle \(\pi/4\).

And the unit circle gets mapped onto itself.

Note that the determinant of this matrix is -1, which means that it reverses orientation. But its columns are still orthonormal, and so it too is an isometry.

Decomposing Matricies into Primitives

The singular value decomposition (SVD) will factor any matrix \(A\) having like this:

\[ A = U \Sigma V^* \]

We are working with real matricies, so \(U\) and \(V\) will both be orthogonal matrices. This means each of these will be either a reflection or a rotation, depending on the pattern of signs in its entries. The matrix \(\Sigma\) is a diagonal matrix with non-negative entries, which means that it is a scaling transform. (The \(*\) on the \(V\) is the conjugate-transpose operator, which just means ordinary transpose when \(V\) doesn’t contain any imaginary entries. So, for us, \(V^* = V^\top\).) Now with the SVD we can rewrite any linear transformation as:

  1. \(V^*\): Rotate/Reflect
  2. \(\Sigma\): Scale
  3. \(U\): Rotate/Reflect

Example

\[\begin{bmatrix} 0.5 & 1.5 \\ 1.5 & 0.5 \end{bmatrix} \approx \begin{bmatrix} -0.707 & -0.707 \\ -0.707 & 0.707 \end{bmatrix} \begin{bmatrix} 2.0 & 0.0 \\ 0.0 & 1.0 \end{bmatrix} \begin{bmatrix} -0.707 & -0.707 \\ 0.707 & -0.707 \end{bmatrix} \]

This turns out to be:

  1. \(V^*\): Rotate clockwise by \(\theta = \frac{3 \pi}{4}\).
  2. \(\Sigma\): Scale x-coordinate by \(d_1 = 2\) and y-coordinate by \(d_2 = 1\).
  3. \(U\): Reflect over the line with angle \(-\frac{3\pi}{8}\).

Example

And here is a shear transform, represented as: rotation, scale, rotation.

\[\begin{bmatrix} 1.0 & 1.0 \\ 0.0 & 1.0 \end{bmatrix} \approx \begin{bmatrix} 0.85 & -0.53 \\ 0.53 & 0.85 \end{bmatrix} \begin{bmatrix} 1.62 & 0.0 \\ 0.0 & 0.62 \end{bmatrix} \begin{bmatrix} 0.53 & 0.85 \\ -0.85 & 0.53 \end{bmatrix} \]