PCA is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.
Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.
PCA can be performed in two alternative ways:
- Eigenvalue Decomposition of the covariance matrix
- Singular Value Decomposition of the data matrix
Simple explanation with eigenvalue decomposition of the covariance matrix
- We have $N$ assets. $\underbrace{X}{T \times N} = \left[\underbrace{r^{(1)}}{T \times 1}, ..., r^{(N)} \right]$ time-series of $N$ stock returns for $T$ periods
- Center (demean) the data: ${X}^{dm} = \left[r^{(1)}-\bar{r}1 \times I{T \times 1}, ..., r^{(K)}-\bar{r}K \times I{T \times 1} \right]$
- $\Sigma$ is the corresponding $N\times N$ covariance matrix: $\Sigma := E[(X-E[X])^\top (X - E[X])] \\= \frac{1}{T} \, {X^{dm}}^\top \; {X^{dm}}$
- Perform the Diagonalization (eigenvalue decomposition) of $\Sigma = E \, \Lambda \, E^\top$
. $E$ is the $N\times N$ eigenvector matrix, each column of $E$ contains one eigenvector. $\Lambda$ is the $N\times N$ diagonal matrix of eigenvalues, each entry on the main diagonal of $Λ$ is an Eigenvalue $λ$
- The Goal of PCA is to Rotate $X^{dm}$ into a matrix $\tilde{X}^{dm}$, such that the resulting Covariance Matrix $\tilde{\Sigma}$ of $\tilde{X}^{dm}$ is diagonal
- Set $P = sort_{desc}(E)$, it is our matrix to rotate the data to achieve diagonal covariance matrix. If we sorting the eigenvalues and the related eigenvectors in descending order, then the Principal Components will be sorted in the descending order.
- Calculate the rotated $\tilde{X}^{dm}$: $\underbrace{\tilde{X}^{dm}}{T \times N} := \underbrace{{X}^{dm}}{T \times N} \; \times \; \underbrace{P}_{N \times N}$
- Each column of $\tilde{X}^{dm}$ is the Principal Component of the data. First column hat the most significance, the last has the least significance (because of sorting of the eigenvectors in 6.)
- Use the eigenvalues to determine the proportion of variation that each $PC_n$ accounts for.
We can quantify the percentage of the variance of that is captured by the ith Principal Component $PC_i$ by the following equation: $percentage_{i} = \dfrac{\lambda_i}{\sum_{j=1}^K \lambda_j}$
- Make decision about omitting some of Principal Components:
- If you determine, that some PCs accounts for small proportion of variation (0.5%. 1%, 3% etc), that you can omit them, to reduce the complexity of your data, without significant information lose
- If you see, that all PCs account for significant proportion of variation ($PC_1 - 40\%$, $PC_2 - 30\%$, $PC_3 - 30\%$) than you will lose much information when omitting one of them
Simple explanation with SVD