⮜ 返回上级

Principal Component Analysis (PCA)



Core Principles

PCA is a linear dimensionality reduction method:

linearly correlated variables data orthogonal transformationstransformed\xRightarrow[\text{orthogonal transformations}]{\text{transformed}} linearly uncorrelated variables data

These linearly uncorrelated variables are the principal components, they keep the most of infomation from the original data representation but has been compressed and de-dimensioned.

Simplify, in 2D → 1D PCA example, the goal of the algorithm is to find a best line (principal component) to represent the 2D points (xi,yi)(x_i, y_i) into a single number with the largest variance.

PCA 2D to 1D

Unknown block type: link_preview

null

Unknown block type: link_preview


As the PCA algorithm has been implemented and optimised by engineerers before, the computation can be done directly with the help of Sklearn, etc., learning its mathematical principles and properties are optional.

🧠 PCA by Correlated matrix Eigendecomposition

  • Input: m×nm\times n sampled data matrix XX.
  • R=1n1XXTR = \frac{1}{n-1}XX^T

    compute kk eigenvalues and correspounding unit eigenvectors to construct the orthogonal matrices:

    V=[v1,v2,,vk]V=\begin{bmatrix} v_1, v_2, \cdots, v_k \end{bmatrix}

    result into a k×nk\times n principal component matrix Y=VTXY = V^TX.

    🧠 PCA by using SVD method

  • Input: m×nm\times n sampled data matrix XX, average of each row is 0
  • X=1n1XTX' = \frac{1}{\sqrt{n-1}}X^T

    do SVD on matrix XX' and keep kk singular values and vectors:

    X=UΣVTX' = U\Sigma V^T

    each column of VV is correspounding to a principal component, result into a k×nk\times n principal component matrix Y=VTXY = V^TX.