# The Singular Value Decomposition

## The Action of a Symmetric Matrix

First of all, it is useful to understand the action of a symmetric matrix in a linear transformation, $\left.T: x \to Ax\right.$. We know that a symmetric matrix can be represented as $\left.A = PDP^T\right.$

where $\left.P\right.$ is an orthogonal matrix (that is, a matrix whose columns are orthogonal to each other, and of unit length). The columns of $\left.P\right.$ are (orthogonal) eigenvectors of $\left.A\right.$, and $\left.D\right.$ is the diagonal matrix of eigenvalues. Thus, when a vector $\left.x\right.$ is multiplied by $\left.A\right.$, $\left.Ax = PDP^Tx\right.$

the geometric result can be visualized using the unit ball (of unit radius) in $\left.R^n\right.$. It is transformed into an ellipsoid, with the eigenvectors representing the axes of the ellipsoid, and the eigenvalues providing the scaling of each axis of the ellipsoid. We can also think of how this transformation comes about as a succession of three steps:

• Multiplication by $\left.P^T\right.$, by the transpose of the orthogonal matrix $\left.P\right.$, corresponds to a rotation of the space $\left.R^n\right.$. It rotates each of the eigenvectors $\left.p_i\right.$ into the standard vector $\left.e_i\right.$.
• Multiplication by $\left.D\right.$ represents a scaling of each of the now "standardized" eigenvectors: each is stretched by the appropriate factor, the eigenvalue;
• Multiplication of the result by $\left.P\right.$ rotates the scaled eigenvectors back into their original positions.

In conclusion, you can think of the "action" of a symmetric matrix as a product of these three simple actions on $\left.R^n\right.$: a rotation, a scaling, and a rotation.

## The SVD Theorem

It turns out that this aspect of symmetric matrices is true in general: every matrix can be thought of as the product of an orthogonal matrix, a diagonal matrix, and the transpose of an orthogonal matrix.

Every matrix $\left.A\right.$ is related to two important symmetric matrices: $\left.A^TA\right.$ and $\left.AA^T\right.$. Since each of these two matrices is symmetric, each can be represented as a product $\left. \begin{cases} AA^T = UD_1U^T\\ A^TA = VD_2V^T \end{cases} \right.$

Furthermore, the diagonal matrices $\left.D_1\right.$ and $\left.D_2\right.$ have positive entries on the diagonal. We can see that from the spectral decomposition of $\left.AA^T\right.$ and $\left.A^TA\right.$. For example, $\left.AA^T=\lambda_1 u_1 u_1^T + \ldots + \lambda_m u_m u_m^T\right.$

Hence, $\left.u_1^T AA^T u_1 = \lambda_1 u_1^T u_1 u_1^T u_1 + 0 = \lambda_1 (u_1 \cdot u_1) (u_1 \cdot u_1) = \lambda_1\right.$,

but since $\left.u_1^T AA^T u_1 = A^Tu_1 \cdot A^Tu_1 = ||A^Tu_1||^2 \ge 0\right.$, we know that $\left.\lambda_1 \ge 0\right.$.

Theorem: Let $\left.A\right.$ be an $\left.m x n\right.$ matrix with real components. Then $\left.A=U\Sigma V^T\right.$ where $\left.U\right.$ and $\left.V\right.$ are as defined above, and $\left.\Sigma\right.$ is the matrix with positive ( $\left.\ge 0\right.$) entries such that $\left.\Sigma \Sigma^T = D_1\right.$ and $\left.\Sigma^T \Sigma = D_2\right.$.

We can easily show that, with $\left.A=U\Sigma V^T\right.$, the formulas for $\left.AA^T\right.$ and $\left.A^TA\right.$ work out: $\left. \begin{cases} AA^T = U\Sigma V^T V \Sigma^T U^T = U\Sigma \Sigma^T U^T = U D_1 U^T\\ A^TA = V \Sigma^T U^T U \Sigma V^T = V \Sigma^T \Sigma V^T = V D_2 V^T \end{cases} \right.$

Notice that the orthogonal diagonalization of a symmetric matrix is the singular value decomposition in that case.

What follows is an illustration of the SVD, as diagrammed by Cliff Long and Tom Hern in the case of $\left.R^2\right.$. In this example, the notation is $\left.U=Q_1\right.$ and $\left.V=Q_2\right.$.

You see that the action of a general matrix $\left.A\right.$ can also be viewed as a rotation, followed by a scaling, followed by a rotation. If $\left.A\right.$ is not square, or not full rank, then there will be a null space, and some of the dimensions will be annihilated.

## The SVD for image processing, image analysis, and statistical analysis

If $\left.A\right.$ is rank $\left.n\right.$, then $\left.A\right.$ can be written as a spectral decomposition, too, in terms of the eigenvectors and the eigenvalues of $\left.A^TA\right.$ and $\left.AA^T\right.$. The eigenvectors of $\left.A^TA\right.$ and $\left.AA^T\right.$ are called the singular vectors of $\left.A\right.$: $\left.A=\sqrt{\lambda_1} u_1 v_1^T + \ldots + \sqrt{\lambda_n} u_n v_n^T\right.$ $\left.A=\sigma_1 u_1 v_1^T + \ldots + \sigma_n u_n v_n^T\right.$

and where, by convention, $\left.\lambda_i \ge \lambda_j\right.$ for $\left.i \le j\right.$. The values $\left.\sigma_i=\sqrt{\lambda_i}\right.$ are called the singular values of the matrix. This is crucial for image processing: often an image is not full rank, so that there are few products; furthermore, it may be that an image contains noise, and the noise tends to be high frequency and of little total weight. It may be that it "hangs out" on the last few singular vector pairs. We simply drop those from the recomposition of the matrix $\left.A\right.$, and we will have de-noised the image.

On a similar note, the information contained in the last singular vector pairs may not be very important to the recomposition of the image, so that we can drop them without much loss of information. This represents compression of the image.

## The SVD summarizes the basic spaces related to a matrix $\left.A\right.$

• The Col( $\left.A\right.$) is given by the columns of $\left.U\right.$ corresponding to non-zero singular values.
• The Row( $\left.A\right.$) is given by the columns of $\left.V\right.$ corresponding to non-zero singular values.
• The Nul( $\left.A\right.$) is given by the columns of $\left.V\right.$ corresponding to zero singular values; and of course
• The Nul( $\left.A^T\right.$) is given by the columns of $\left.U\right.$ corresponding to zero singular values.

## Applications

• We can use the SVD to interpolate a matrix. See Long, A. and C. Long. Surface Approximation and Interpolation Via Matrix SVD. The College Mathematics Journal, Vol. 32, #1, January, 2001:20-25.
• When we can't invert a matrix, we can define a "pseudo-inverse" via the SVD. This is precisely what we do when we solve the linear regression/least squares problem in general: we can't solve $\left.Ax=b\right.$ because the system is inconsistent; but we can find a least-squares solution, by inverting (when possible). When it's not possible, we use a pseudo-inverse solution.
• The SVD is used for image compression, as you can see by visiting this page.