カリフォルニア大学バークレー校(UC Berkeley)での思考の記録。

線形代数 2

Set topology


 \mathcal{P} = \left\{ x^{(1)}, ..., x^{(n)} \right\}
* Linear hull
 x = \sum \lambda_{i} x^{(i)}
* Affine hull
 x = \sum \lambda_{i} x^{(i)} , where  \sum \lambda_{i} =1
* Convex hull
 x = \sum \lambda_{i} x^{(i)} , where  \sum \lambda_{i} =1 and  \lambda_{i} \geq 0
* Conic hull
 x = \sum \lambda_{i} x^{(i)} , where  \lambda_{i} \geq 0


Optimality conditions


 \nabla f_{0}=0


 Ax=b, and  \exists \nu \in \mathbb{R}^{m};  \nabla f_{0}+A^{T}\nu = 0


 \lambda_{i} \geq 0,  i \in \mathcal{A}(x)
 \lambda _{0} + \sum_{i \in \mathcal{A}(x)} \lambda_{i} \nabla f_{i}=0,
where  \mathcal{A}(x) is a set of active constraints.


Consider a primal in standard form.  p^{*} = \min_{x \in \mathbb{R}^{n}} f_{0}(x), s.t.  f_{i}(x) \leq 0, i=1, ..., m,  h_{i}(x) = 0, i = 1, ..., q.


 \mathcal{L}(x, \lambda, \nu) \doteq f_{0}(x) + \sum_{i=1}^{m}\lambda_{i}f_{i}(x) + \sum_{i=1}^{q}\nu_{i}h_{i}(x),
where  \lambda, \nu are called the Lagrange multipliers, or dual variables, of the problem.

Lagrange dual function

 g(\lambda, \nu) = \min_{x \in \mathcal{D}} \mathcal{L}(x, \lambda, \nu), where
 g(\lambda, \nu): \mathbb{R}^m \times \mathbb{R}^q \rightarrow \mathbb{R}.

Lower bound property of the dual function

The dual function  g(\lambda, \nu) is jointly concave in  (\lambda, \nu). Moreover, it holds that
 g(\lambda, \nu) \leq p^{*}, \ \forall \lambda \geq 0, \forall \nu.

Dual optimization problem and weak duality

d  = \max_{\lambda, \nu} g(\lambda, \nu) s.t.  \lambda \geq 0.
It is remarkable that the dual problem is always a convex optimization problem, even when the primal problem is not convex. Weak duality property of the dual problem is: d

Strong duality and Slater's condition for convex programs

The first  k \leq m of the convex  f_{i} are affine. If there exist a point  x \in relint \mathcal{D} such that
 f_{1} \leq 0, ..., f_{k} \leq 0, f_{k+1}<0, ..., f_{m}<0
  h_{1} = 0, ..., h_{q} = 0,
then strong duality holds between the primal and dual problems, that is, p=d.

Complementary slackness

A primal and the corresponding dual inequality cannot be slack simultaneously.
 \lambda_{i} f_{i}(x)=0
If  f_{i}(x) \leq 0, then it must be  \lambda = 0. If  \lambda > 0, then it must be  f_{i}(x) = 0.


線形代数 1

Properties of trace and rank

 tr(cA+dB) = ctr(A) + dtr(B)
 tr(AB) = tr(BA)
 x^{T}Ax = tr(x^{T}Ax)= tr(Axx^{T})
 tr(A) = \sum_{i=1}^{n} \lambda_i(A)
 rank(AB) \leq rank(A), rank(AB) \leq rank(B)  rank(AA^{T}) = rank(A^{T}A) = rank(A)


  • Definition
     ||x|| \geq 0, \forall x \in \chi
     ||x|| = iff x = 0
     ||x+y|| \leq ||x|| + ||y||, \forall x, y \in \chi
     ||\alpha x|| = |\alpha| ||x|| for any scaler  \alpha and any  x \in \chi.
  • Examples
     ||x||_{1} = \sum |x|_{k}
     ||x||_{\infty} = \max{|x|_{k}}

Matrix Norm

  • Frobenius Norm
     ||A||_{F} = \sqrt{trAA^{T}}= \sqrt{\sum \sum |a_{ij}|^{2}}=\sqrt{\sum \lambda_{i}(AA^{T})}
     \forall x \in \mathbb{R}^{n}, ||Ax||_{2} \leq ||A||_{F}||x||_{2}

Cauchy-Schwartz inequalities and definition of angles

 \forall \ x, y, \in \mathbb{R}^n: \ x^{T}y \leq |x|_{2}|y|_{2}
We can define the corresponding angle as  \theta such that
 \cos \theta = \frac{x^{T}y}{|x|_{2}|y|_{2}}.

Range, nullspace, and rank

 \mathit{R}(A) := \left \{Ax: x \in \mathbb{R}^{n} \right \}
 \mathit{N}(A) := \left \{x \in \mathbb{R}^{n}: Ax = 0 \right \}
The rank of matrix A is the dimension of its range.

Matrix description of subspaces

Symmetric matrix

 A^{T} = A
* Diagonal matrix
* Dyad  uu^{T}
Any quadratic function can be written as
 q(x) = x^{T}Ax, where  A \in \mathit{S}^{n}

Congruence transformations

For any matrix  A \in \mathbb{R}^{m, n} it holds that:
*  A^{T}A \succeq 0, and  A A^{T} \succeq 0;
*  A^{T}A \succ 0 if and only if  A is full-column rank, i.e.,  rank A  = n;
*  AA^{T} \succ 0 if and only if  A is full-row rank, i.e.,  rank A = m.

Eigenvalue and eigenvector

  •  \sum \lambda_i = tr(A)
  •  \prod \lambda_i = det(A)
  • Therefore  det(A) = 0 means at least one eigenvalue is 0. Also, A is invertible when  \lambda_i \neq 0 \ \forall \ i
  • Eigenvalues of symmetric matrices are nonnegative. If  \mathbf{A} is positive semi-definite, then  \lambda_{i} \geq 0
  • Eigenvalues of positive definite matrices are positive. If  \mathbf{A} is positive definite, then  \lambda_{i} > 0
  • From  \prod \lambda_i = det(A), matrix A is invertable if and only if A is positive definite.

Spectral decomposition (a.k.a. eigendecomposition) for symmetric matrix

Any symmetric matrix can be decomposed as a weighted sum of normalized dyads.  A \in \mathbb{R}^{n, n}
 A^{T} = A then A can be described by eigenvalues and eigenvectors of A.
 A = U \Lambda U^{T}
Or,  U^{T} A U = \Lambda

Singular value decomposition

In words, the singular value theorem states that any matrix can be written as a sum of simpler matrices (orthogonal dyads).
 A \in \mathbb{R}^{m, n}
 rank(A) = r
Then A can be described by singular values of  A (i.e. eigenvalues of  A^{T}A) and eigenvectors of  A^{T}A and  AA^{T} as follows.
 A = \tilde{U} \tilde{\Sigma} \tilde{V}^T
Or, in compact form,
 A = U \Sigma V^{T}=\sum_{i=1}^{r}\sigma_{i}u_{i}v_{i}^{T}.
 U \in \mathbb{R}^{m, n}, V \in \mathbb{R}^{n, r}, \Sigma \in \mathbb{R}^{r, r}
 \sigma_i^{2} = \lambda_i(AA^{T}) = \lambda_i(A^{T}A) \geq 0 (Because  AA^{T} and  A^{T}A are p.s.d..)
 V and  U are orthogonal matrices.
* SVD, range, and nullspace
The first  r columns of  U are the orthogonal basis of range space (columns space) of A.
The last  n-r columns of  V are the orthogonal basis of nullspace of A.

Cholesky decomposition of p.s.d. and p.d. matrices

If  \exists \mathbf{P} such that  \mathbf{A} = \mathbf{PP^{T}}, then  \mathbf{A} is positive semi-definite.
If  \mathbf{A} is positive semi-definite, then  \exists \mathbf{P} such that  \mathbf{A} = \mathbf{PP^{T}}. That is, any p.s.d. matrix can be written as a product  A = PP^{T}. P is not unique. If A is positive definite, then we can choose lower triangular matrix for the decomposition as  A = LL^{T}, where L is invertable.
Example of p.s.d. matrix
Variance-covariance matrix is a notable example of p.s.d. matrix.
 \Sigma = E(x - \hat{x})(x - \hat{x})^{T}, where  \hat{x} = E(x).

Rayleigh quotient

Given  A \in \mathbb{S}^{n}, it holds that
 \lambda_{min} \leq \frac{x^{T}Ax}{x^{T}x} \leq \lambda_{max}  \forall x \neq 0
Therefore we can solve optimization problem in quadratic form by finding eigenvalues.  \lambda_{max}(A) = \max_{||x||_2 = 1} x^{T}Ax

 \lambda_{min}(A) = \min_{||x||_2 = 1} x^{T}Ax

Properties of eigenvalues and eigenvectors

Type of matrix Eigenvalues Eigenvectors
Symmetric  A^{T}=A real \lambda's orthogonal  x_{i}^{T}x_{j}=0
Orthogonal  Q^{T} = Q^{-1} all  \lambda=1 orthogonal x_{i}^{T}x_{j}=0
Positive definite  x^{T}Ax all  \lambda > 0 orthogonal
Similar matrix  B=M^{-1}AM  \lambda(B) = \lambda(A)  x(B) = M^{-1}x(A)
Projections  P=P^{2}=P^{T}  \lambda = 1; 0 column space; nullspace
Every matrix A = U \Sigma V^{T} rank(A) = rank( \Sigma) eigenvectors of  A^{T}A, AA^{T} in  V, U

空間データ分析 2

3D data and causal inference

3D data: points, fields, and surfaces

Transects of data

Digital elevation models

Using space to colocate events

Causal inference

Computation in spatial problems

Minimum distance along a network

Spatial Markov Chains

Optimizing policies in space

Intro to parallel computing

Intro to remote sensing

Light and the electro-magnetic spectrum

Color and spectral bands

Change detection

Cloud removal

Image sharpening

Convolutional filters

Parameters, probability, stocks and flows

3D fields

Variable associations as a function of space

Spatial lags

Flux and divergence

Intro to big data

Regressions in space

Geographically-weighted regression models

Spatial autocorrelation

Spatially correlated errors

Issues arising from aggregation

Bayesian search theory



Convex set and convex function

  • A set  C is convex if and only if
     \forall x_1, \ x_2 \in C, \ \lambda \in [0, 1]
  •  f: \mathbb{R}^n \rightarrow \mathbb{R} is a convex function if and only if
     dom \ f is convex
     \forall x, \ y \in dom \ f, \ x \in [0, 1]
     \lambda f(x) + (1-\lambda)f(y) \geq f(\lambda x + (1-\lambda)y)
  •  f is strongly convex if and only if
     \exists m > 0, s.t.  \tilde{f(x)} = f(x) =\frac{m}{2}||x||_{2}^{2}
    That is,
     f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda)f(y)-\frac{m}{2}\lambda(1-\lambda)||x-y||_{2}^{2}
    Examples of convex sets
  • empty set
  • a single point
  •  \mathbb{R}^n
  • Halfspace:  a^{T}x \leq b
  • Polyhydra  Ax \leq b
  • Balls and eclipse

Examples of convex function
* All norms on  \mathbb{R}^{n}
*  -\log x
* Affine function:  f(x) = a^{T}+b
* Max function:  f(x) = max \left\{ x_1, \ x_2, \ ..., \ x_n, \right\}

Epigraph condition

 f is convex if and only if its epigraph  epi\ f:= \left \{(x, t) \in \mathbb{R}^{n+1}: t \geq f(x) \right \}
is convex.

Restriction to a line

 f} is convex if and only if its restriction to any line is convex, meaning that for every [tex: x_{0} \in \mathbb{R}^{n} and v \in \mathbb{R}^{n}, the function  g(t):= f(x_{0} + tv) is convex.

First-order condition

If f is differentiable, f is convex if and only if
 \forall \  x, \ y \in dom \ f
 f(y) \geq f(x) + \nabla f(x)^{T} (y - x)

Second-order condition

If f is twice differentiable, f is convex if and only if
 \nabla^{2} f \ \succeq 0 \ \forall \ x \ \in dom \ f
i.e. Hessian is positive semidefinite. Positive semidefiniteness is the key to the convexity of a function.

Operations that preserve convexity

  • Intersection
  • Nonnegative linear combination
  • Affine variable transformation
  • Pointwise maximum

Separation theorem

Theorem 1: Separating hyperplane

If  \mathit{C}, \mathit{D} are two convex subsets of \mathbb{R}^{n} that do not intersect, then there is an hyperplane that separates them, that is,  \exists a \in \mathbb{R}^{n}, a \neq 0 and  b \in \mathbb{R} such that  a^{T}x \ \leq \ b \ \forall \ x\  \in \ \mathit{C}, and  a^{T}x\  \geq \  b \ \forall \ x \ \in \ \mathit{D}.

Theorem 2: Supporting hyperplane

If  \mathit{C} \subseteq \mathbb{R}^{n} si convex and nonempty, then for any  x_{0} at the boundary of  \mathit{C}, there exists a supporting hyperplane to  \mathit{C} at  x_{0}, meaning that there exist  a \in \mathbb{R}^{n}, a \neq 0, such that  a^{T}(x - x_{0}) \leq 0 for every  x \in \mathit{C}.

Composite function and convexity

 f = h \circ g * If  h,\ g convex and  h nondecreasing,  f is convex. * If  g is concave and  h is convex and nonincreasing,  f is convex.

* If  g is convex, then  \exp g(x) is convex.
* If  g is concave and positive, then  \log g(x) is concave.
* If  g is concave and positive, then  \frac{1}{g(x)} is convex.
* If  g is convex and nonnegative and  p \geq 1, then  g(x)^{p} is convex.
* If  g is convex, then  -\log(-g(x)) on  g(x) \leq 0.
* If  g_i are convex, then  \ln \sum \exp g(x) is convex.

Jensen's inequality

Let  f: \mathbb{R}^{n} \rightarrow \mathbb{R} be a convex function, and let  z \in \mathbb{R}^{n} be a random variable.
 f(\mathbb{E}\left \{z \right\}) \leq \mathbb{E}(f(z))
For a discrete probability distribution,  z^{(i)} take  x^{(i)} with probability  \theta_{i}. Then,  f(\sum \theta_{i} x^{(i)}) \leq \sum \theta_{i}f(x^{(i)}), where  \sum \theta_{i}=1.