Bingnan's Digital Garden

❯

❯

maths / linear algebra

maths / linear-algebra

Mar 18, 20267 min read

public

Notes
Vector space: set with special strucutre
- could be infinite-or-finite dimension. will limit to finite below
- span, linear independent, subspace, direct sum, finally basis
- dimension is defined as number of basis
Linear maps from V → W.
- they could form vector spaces as well if equipped with add/mul…
- fundamental theorem of linear algebra - relationship between fundamental sub-spaces
  - fundamental subspaces:
    - kernel in V. So dim(ker) < dim(V)
    - image/range in W. So dim(img) < dim(W)
  - dim(V) = dim(kernel) + dim(image)
    - Among all input dimensions dim(V): dim(kernel(A)) are zeroed, what’s left is dim(V) - dim(kernel(A)) = dim(range(A)) ⇐ dim(W)
  - in/sur-jection depends on domain and co-domain we talk about
  - empty kernel ⇐> injective
  - empty W-range(A) ⇐> surjective
- Iso-morphism:
  - “iso” means same, “morph” means shape.
  - any two same-dim finite vector spaces are isomorphic
  - isomorphism is bi-jective linear map
  - echo CategoryTheory
Matrix of linear map and matrix product
- “左行乘右列”: each operation is a inner product, producing one scalar in result
- left columns linear combined using right column:
  - can be considered a vectorized version of “左行乘右列”
  - note num. columns is the same as input space dimension
  - k-th column are coordinates of in output space’s standard basis, using k-th standard basis of input space as input.
  - similarly, k-th columns of matrix product $A B$ is $A B_{k - th col}$
- leverage matrix to consider dim(linear map from dim(m) to dim(n)): $dim = mn$
  - choose one cell as 1, leaving others to be zero. This is a basis. There’re m times n cells.
- mxn matrix is isomorphism to linear map from dim m to dim n.
Quotient space: TBD. might be related to machine-learning/dim-reduction
Dual map, linear functionals.
- linear functional maps from vector space to scalar
- dual map maps a functional in one space to a functional in another space
operators: linear map that has identical domain and co-domain
- Projection operator: $P^{2} = P$ , so $P$ is always singular
  - understanding it: 1) using the vector entries as coefficient to linearly combine column vectors of $P$ ; 2) inner-product of each row vector of $P$
Solving linear systems
- fundamental theorem of linear maps
- NOW determinant for non-square matrix is generalized to SVD :LOGBOOK: CLOCK: [2023-10-05 Thu 22:46:44] CLOCK: [2023-10-05 Thu 22:46:47] CLOCK: [2023-10-05 Thu 22:47:14] :END:
Eigen space:
- motivation: find invariant sub-space so we could study linear maps easier
- 1-D invariant subspace → eigenvalue and eigenvector
- ALL square matrices have n (with multiplicity) complex eigen-values collapsed:: true
  - For those diagonolizable:
    - Real eigenvalues correspond to (non-uniform) scaling (reflection as a special -1 scaling) in eigen directions;
    - (conjugate pairs of) complex eigenvalues correspond to scaling + rotation
      - scaling is defined as magnitude of the complex number, rotation is defined as the arg of one of the pair. $e^{i θ}$
        
        Every rotation can only be considered in a 2D plane. And note C^1 is isomorphic to R^2 as vector spaces: collapsed:: true
        
        A single complex number can be visualized by a vector in the complex plane. Note that the set $C$ of complex numbers is a one-dimensional vector space $C^{1}$ over itself (i.e. over $C$ , as a complex vector space). So in that sense it’s more like a “complex number line” — akin to the real number line $R$ as a one-dimensional vector space $R^{1}$ over itself (i.e. over $R$ , as a real vector space). We visualize the complex plane as a plane, however, because $C$ is two-dimensional as a real vector space over $R$ . And we as humans (at least, the vast majority of us) are much better at visualizing real dimensions, and only up to three of them.
        
        note Euler’s formula $e^{i θ} = cos (θ) + i s in (θ)$
  - For those non-diagonolizable:
    - $A = [[1, 1], [0, 1]]$ a sheer to right transformation
  - translation is impossible. Possible ones are just scaling and rotation:
    - sheer could be broken down to scaling+rotation but awkward
    - reflection is just scaling by -1
    - projection is scaling by zero
    - identity, ofc
- Characteristic polynomial is calculated off matrix but ultimately about operator, so they actually don’t depend on basis of choice.
  - geometric multiplicity ⇐ algebraic ⇐ dimension
- Diagonolized matrix A makes it easy to compute polynomial of A
- any rotation could be represented as product of a series of elementary rotation, each dealing with one plane
Inner-product space:
- to talk about distance and angles
- minimization via projection
  - OLS linear-regression could be thought of a projection from Y space (dim n) to feature space (dim k). This leads to “residual orthogonal to X” as well.
    - NOW $Yha r = X (X^{'} X)^{- 1} X^{'} Y$ does not look like a projection matrix :LOGBOOK: CLOCK: [2023-10-06 Fri 20:57:50] CLOCK: [2023-10-06 Fri 20:57:56] :END:
      - but seems indeed https://en.wikipedia.org/wiki/Projection_matrix#:~:text=In%20statistics%2C%20the%20projection%20matrix,has%20on%20each%20fitted%20value.
  - projection matrix naturally has: $P^{2} = P$
- Riesz Representation Theorem:
  - functionals on Hilbert (complete inner product) space are isomorphic to inner products with a fixed vector in that space.
    - i.e. f(v) is bijectively mapped to <v, u> where u uniquely exists for each f.
  - LATER related application: Gaussian Process Regression
- Adjoint and conjugate transpose
  - <Tu, v> = <u, T*v>
  - the existence of the right hand side is a obvious result of Riesz repr theorem. Then we find a T* from v and that unique vector (T*v).
  - self-adjoint i.e. Hermitian
    - Could be diagnolized with eigen values are all real, eigen vectors are orthognol
- Polar decomposition $A = U ∣ A ∣$ :
  - $U$ : unitary, isometry, non-singular. rotation transform.
  - $∣ A ∣ = A A^{*}$ : positive semi-definite. Modulars of $A$ . Scaling transform
- SVD $A = U ∣ A ∣ = U S V^{*} = \sum_{i} σ_{i} u_{i} v_{i}^{T}$
  - it separates the PD from polar decomposition
  - diagnolization w.r.t. two different unitary basis
  - singular values are just sqrt of eigen values of $A A^{*}$
  - full SVD and reduced/economic SVD (up to larger-than-thresh singular values)
  - geometric interpretation: unit ball to ellipsoid
  - implementation:
    - first eigen decompose $A A^{*}$ . Eigen values sqrt = singular values; eigen vectors are $V$
    - then calculate $U$ : $\frac{1}{σ _{i}} A v_{i}$
  - linear system: error in y = cond_number * error in input
    - the amplification is defined by condition number, ratio between max and min singular value
  - OLS linear-regression could be done via SVD
    - replace X in $β_{hat} = (X^{T} X)^{- 1} X^{T} Y$ with $X = U S V^{T}$ we get $β_{hat} = V S^{- 1} U^{T} Y$
    - reference: https://sthalles.github.io/svd-for-regression/
    - LATER This is minimal-norm solution?
  - PCA:
    - first demean
    - could eigen decompose Cov(X)
    - or could SVD(X) directly
  - the absolute value of the determinant is the product of the singular values
Change of basis
- Represent a vector under another basis:
  - If $B$ is a matrix of column vectors as basis, then given $v_{E}$ how do we get $v_{B}$ ? Here $v_{E}$ is vector coordinates under standard basis
    - Say $v_{B} = [v_{1}, v_{2}, \dots, v_{n}]^{T}$ , then $v_{E} = \sum_{i} v_{i} b_{i} = B v_{B}$ , where $b_{i}$ are columns of $B$ . Therefore, $v_{B} = B^{- 1} v_{E}$
- Represent a linear map under different source and destination basis:
  - $[T]_{PQ} = [I]_{PM} [T]_{MN} [I]_{NQ}$
    - we get similar matrix if we set $M = N$ and $P = Q$
      - i.e. making intput/output basis the same, then change basis
      - we get $[T]_{PP} = B [T]_{MM} B^{- 1}$ where $B = [I]_{PM}$
  - $[I]_{PM}$ is easier to calculate if we notice $[I]_{PM} = [I]_{PE} [I]_{EE} [I]_{EM} = [I]_{PE} [I]_{EM}$ where $E$ is standard basis
Quadratic form
- bi-linear form: linear in both variables
- quadratic form expressed by matrix:
  - pick symmetric matrix
  - $< A x, x >$ is equivalent with $x^{'} A x$
- Check P.D.
  - det(A_i) > 0 for all i from 1 to n. A_i is top-left sub-matrix of A
  - all eigen values > 0
Determinant $Det (A)$ and Trace $Tr (A)$ where A is square
- Det
  - Zero: some input dim collapsed; rank < n, kernel space non trivial.
  - Non-zero: magnitude is overall scaling = product of eigen-values; sign is
  - always equal to product of eigenvalues (in complex numbers)
  - $(- 1)^{n}$ time const term in characteristic polynomial
- Trace
  - $z^{n - 1}$ term coef in characteristic polynomial
Numerical methods
- Jordan Cononical Form: not stable. intuitively diagnolizable matrices are dense
- eigen decomposition, inversion and LU decomposition are not stable when it’s ill-conditioned.
- SVD is stable
- QR is stable
- Find largest eigen-value: $A^{n} v$ will get closer to $λ_{ma x} v$ as $n \to + in f$
  - find smallest eigen-value, use $A^{- 1}$
  - find full eigen-decomposition: QR-decomposition under the hood
  - find roots for characteristic poly: too slow and no general solution
Books

Graph View

Notes
Linear maps from V <span>→</span> W.
Matrix of linear map and matrix product
Solving linear systems
Eigen space:
Inner-product space:
Change of basis
Quadratic form
Determinant \textbf{Det}(A) and Trace \textbf{Tr}(A) where A is square
Numerical methods
Books

Backlinks

maths

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community