Contents

Dimension reduction with Principal Component Analysis

There are different technique to reduce dimensionality. One of the most known is Principal Component Analysis (PCA). Probably the most known one. The aim of this article is to explain the math behind PCA. It is based on eigen decomposition.

/images/2022/pca/pca_plot_viz.png
Figure 1 : PCA

PCA is based on the properties of symmetric matrices and eigen decomposition.
Let’s consider the set of vectors { \( x_1,…,x_n \) } that represent our data.

How to apply PCA

Step 1 : Compute the mean and substract it to get data with null Esperance

$$x_i=x_i-\frac{1}{n} \sum_{i=1}^n x_i$$

Step 2 : Compute the covariance matrix

$$C = \frac{1}{n} \sum _{i=1}^n x_ix_i^T$$

Step 3 : Eigen decomposition

For a matrix M, an eigen value \( \lambda \) with the associated eigen vector \( u \) verify \( Mu = \lambda u \).

Spectral theorem
For a symmetric matrix M, there is an eigen decomposition: $$M=\sum_{i=1}^n \lambda_i u_i u_i^T$$ where \( \lambda_i \) is the i-th eigen value with \( u_i \) the associated eigen vector. Besides, the eigen values are real.

The covariance matrix \(C\) is symmetric, so we can apply the eigen decomposition \(C = \sum_{i=1}^n \lambda_i u_i u_i^T \) with \( \lambda_1 \geq \lambda_2 \geq …. \geq \lambda_n \)

Step 4 : Projection onto eigenvectors

Let’s consider that we want to reduce the dimension from n to m. We just need to project the data onto the m first eigen vectors because they have the highest eigen values. We then get the result: $$ \forall i \in { 1,…,n } , x_i' = [u_1^Tx_i,….,u_m^Tx_i]^T$$

Generalisation to any kernel : Kernel PCA

Let’s consider a kernel with the expression \( k(x,y)= \phi(x) \phi(y)^T \). For instance a gaussian kernel can be written as \( k(x,y)= \frac{1}{\sqrt{2 \pi }\sigma} e^{- \frac{ {||x-y|| }^2}{2 \sigma^2} } \)
We can define \( \phi_i:=\phi(x_i) \) and consider the matrix \( C = \sum_{i=1}^n \phi_i \phi_i ^T \)

You may be able to better understand the PCA now. Do not hesitate if you have any questions.