Linear algebra is the backbone of deep learning. As Andrew Ng famously said,
“Linear algebra is the language of deep learning”.
Vectors and Matrices
Several types of mathmatical objects are important in discussing linear algebra study:
Scalars: single number, in contrst with the array, matrix … which are the series of numbers.
Vector: An ordered list of numbers (scalars) often represented as a column.
e.g., a vector in can be written as
Matrix (Matrices): A rectangular array of numbers with size ( rows, columns), or 2-D array. We often denote a matrix as where is the entry in the th row and th column.
e.g. matrix: . A column vector is a special case of a matrix (size ).
Tensors are generalizations of matrices to higher dimensions (used to store multi-dimensional data in deep learning).
Matrix Operations
Transpose: mirror image of the matrix acress the main diagonal line. .
e.g.,
Addition: for matrices and of the same size, e.g. , (add corresponding entries).
Scalar Multiplication: for scalar , .
Matrix Multiplication: is and is , their product is an matrix with entries: (sum of row of times column of ).
Matrix multiplication is associative and distributes over addition (but not commutative in general).
Multiplication (dot product) operation is both distributive, , and associative, . But it’s NOT commutative, .
BUT, the dot product of vectors is commutative, :
Untile now, we can introduce the system of linear algebra, which is the main problems notation that we are goning to solve. , where is a matrix, is a vector, is a vector of unknown.
Additionals
Identity Matrix : Mathmatically: . It acts as the multiplicative identity: and .
Simply, , or .
Inverse: For a square matrix , the inverse satisfies . Only invertible (non-singular) matrices have inverses, which requires full rank (no zero eigenvalues, see rank below).
e.g. For a , , (provided ).
And we do can solve from :
Determinant ( or ): A scalar value defined for square matrices that encodes volume scaling factor of the linear transformation and whether it flips orientation.
e.g. For as above, . For larger matrices, it’s computed via a recursive formula or row reduction.
,
.
Trace: The trace of a square matrix is the sum of its diagonal entries: . Notable property: the trace equals the sum of eigenvalues of (counting multiplicity), and it is invariant under change of basis.
Rank: The rank of a matrix is the number of linearly independent rows (which equals the number of independent columns). It indicates the dimension of the subspace spanned by its columns (column space). Full rank means rank = ; for a square matrix, full rank (rank ) means the matrix is invertible. Low rank means the matrix’s rows/columns are linearly dependent (some redundancy in information).
e.g., has rank 2: the first two columns are linearly independent, so the rank is at least 2, but since the third is a linear combination of the first two, the three columns are linearly dependent so the rank must be less than 3.
How to compute rank of matrix:
The final matrix (in reduced row echelon form) has two non-zero rows and thus the rank of matrix is 2.
Norm is used to measure the size of matrix. norm: , for . There are some usual used norms:
Euclidean Norm ( ): or simply denote as , genrally known as default ‘distance’ in geometry.
norm: . commonly used in machine learning.
Max Norm ( ):.
Special matrices
Symmetric matrix: .
Unit Vector is the vector with unit norm: .
Orthogonal: , and are orthogonal vectors; , the is an orthogonal matrix, also .
Their inverse is very cheap & easy to compute.
For a square matrix , a non-zero vector is an Eigenvector if for some scalar . Here is called the eigenvalue corresponding to .
To determine the eigenvalues of , solve the characteristic equation: , and eugenvector by solving .
Eigendecomposition
Assuming has linearly independent eigenvectors with correspounding eigenvalues, given and , the eigendecomposition of is then:
For Symmetric Matrices,
where is an Orthogonal matrix composed of eigenvector of A and is a diagonal matrix of eigenvalues. Geometrically, scaling space can be decomposit as (length) in direction .
Singular value decomposition is a fundamental matrix factorization. It states that any real matrix can be factored as:
is an orthogonal matrix, rotation or reflection;
is an diagonal matrix (nonnegative), scales the axes by the singular values;
is an orthogonal matrix, second rotation or reflection.
The diagonal entries of are the singular values of . They are conventionally ordered .
SVD says that = (orthogonal rotation/reflection) (scaling) (orthogonal rotation/reflection).
🧠 How to compute SVD:
The folowing algorithm is mathematically valid, but in practice, this simple method can lead to numerical instability and increased computational complexity.
Unknown block type: numbered_list_item
bring into eigenequation , end up eigenvectors .
Unknown block type: numbered_list_item
Unitise the eigenvectors , into , get .
Unknown block type: numbered_list_item
Singular value ,
Therefore, square diagonal matrix
Unknown block type: numbered_list_item
For the first positive singular values of :
Find a set of standard orthogonal bases in the zero space of :