0%

Kernels

kernel Functions

We define a kernel function to be a real-valued function of two arguments \(k(\mathbf{x}, \mathbf{x}^\prime) = <\Phi(\mathbf{x}), \Phi(\mathbf{x}^\prime)>_H \in \mathbb{R}\), for \(\mathbf{x}, \mathbf{x}^\prime \in \mathbb{X}\) that measures similarity between two inputs. Where \(\Phi\) is maps into some inner product space \(H\), sometimes called the feature space.

Positive Definite Kernel

It can be shown that a kernel that corresponds to an inner product in some inner product space coincides with the class of positive definite kernels.

Definition 1: Gram Matrix

Given a kernel \(k\) and inputs \(x_1, ...., x_n \in \mathbb{X}\), the \(n \times n\) matrix:

\[K := (k(x_i, x_j))_{ij}\]

is called the Gram matrix or kernel matrix of \(k\) w.r.t \(x_1, ...., x_n\)


Definition 2: Positive Definite Kernel

The function \(k\) is called a positive definite kernel if:

\[\sum^N_{n=1} \sum^N_{m=1} a_n a_m k(\mathbf{x}_n, \mathbf{x}_m) \geq 0\]

for any real numbers \(a_n, a_m\) and points \(\mathbf{x}_n, \mathbf{x}_m \in \mathbb{X}\) and any \(N \in \mathbb{N}\).


Variational Inference: A Review for Statisticians

General Problem Setting:

Consider a joint density of latent variables \(\mathbf{z} = z_{1:m}\) and observations \(\mathbf{x} = x_{1:n}\):

\[p(\mathbf{z}, \mathbf{x}) = p(\mathbf{z}) p(\mathbf{x}|\mathbf{z})\]

The main idea behind variational inference is to use optimization to compute the conditional density of the latent variables given the observations:

\[p(\mathbf{z} | \mathbf{x}) = \frac{p(\mathbf{z}) p(\mathbf{x}|\mathbf{z})}{p(\mathbf{x})}\]

This conditional can be used to produce point or interval estimates of the latent variables, form predictive densities of new data \(p(x^* | \mathbf{x})\) and more.

The quantity:

\[p(\mathbf{x}) = \int_\mathbf{z} p(\mathbf{z}, \mathbf{x}) d\mathbf{z}\]

is called the evidence, for many models, the evidence integral is unavailable in closed form or requires exponential time to compute.

Evidence Lower Bound

In variational inference, we specify a family \(\Theta\) of densities over the latent variables. Each \(q(\mathbf{z})\) is a candidate approximation to the exact conditional. Our goal is to find the best candidate, the one closest in KL divergence to the exact conditional. Inference now amounts to solving the following optimization problem:

\[q^*(\mathbf{z}) = \arg\min_{q(\mathbf{z}) \in \Theta} KL(q(\mathbf{z}) \;||\; p(\mathbf{z} | \mathbf{x})) = \int_{z} q(\mathbf{z})\log\frac{q(\mathbf{z})}{p(\mathbf{z}| \mathbf{x})} = E_{\mathbf{z} \sim q(\mathbf{z})}[\log q(\mathbf{z})] - E_{\mathbf{z} \sim q(\mathbf{z})}[\log p(\mathbf{z} | \mathbf{x})] = E_{\mathbf{z} \sim q(\mathbf{z})}[\log q(\mathbf{z})] - E_{\mathbf{z} \sim q(\mathbf{z})}[\log p(\mathbf{z}, \mathbf{x})] + \log p(\mathbf{x})\]

However, this objective is not computable because it requires computing the evidence \(\log p(\mathbf{x})\). Since we cannot compute KL, we optimize an alternative objective that is equivalent ot the KL up to an added constant:

\[ELBO(q) = E_{\mathbf{z} \sim q(\mathbf{z})} [\log p(\mathbf{z}, \mathbf{x})] - E_{\mathbf{z} \sim q(\mathbf{z})}[\log q(\mathbf{z})]\]

This function is called the evidence lower bound (ELBO), ELBO is the negative KL divergence plus a constant \(\log p(\mathbf{x})\) w.r.t \(q(\mathbf{z})\). Maximizing ELBO is equivalent to minimizing the KL divergence.


Property 1: ELBO is the Balance Between Likelihood and Prior

Continue from above equation of \(ELBO(q)\), we can rewrite it as (\(E[\cdot] := E_{\mathbf{z} \sim q(\mathbf{z})}[\cdot]\)):

\[ELBO(q) = E[\log p(\mathbf{z})] - E[\log q(\mathbf{z})] + E[\log p(\mathbf{x} | \mathbf{z})] = -KL(q(\mathbf{z}) | p(\mathbf{z})) + E[\log p(\mathbf{x} | \mathbf{z})]\]

In order to find a \(q(\mathbf{z})\) that maximize ELBO, we have to maximize the expected log likelihood of the data given the hidden variables and minimize the distance to the prior (so the prior acts as a regularizer).

Property 2: ELBO is the Lower Bound of the Log Evidence

ELBO lower bounds the log evidence:

\[\log p(\mathbf{x}) = KL(q(\mathbf{z}) \;||\; p(\mathbf{z} | \mathbf{x})) \geq ELBO (q), \quad \quad \forall q \in \Theta\]

REF

https://arxiv.org/pdf/2205.14415.pdf

Time Series (3)

Spectral Analysis and Filtering

We define a cycle as one complete period of a sine or cosine function defined over a unit time interval:

\[X_t = A\cos(2\pi\omega t + \phi) \quad \quad (4.1)\]

for \(t = 0, \pm 1, \pm 2, ...\), where:

  • \(\omega\) is a frequency index, defined in cycles per unit time, so if \(\omega = 2\), for every 1 time unit, we have \(2\) cycles, this is fixed.
  • \(A\) determining the height or amplitude of the function, this is random.
  • \(\phi\) determining the start point of the cosine function, this is random.


Using the trigonometric identity, we can write above equation as:

\[X_t = U_1 \cos (2 \pi \omega t) + U_2 \sin (2 \pi \omega t) \quad \quad (4.2)\]

Where:

  • \(U_1 = A \cos \phi\) is often taken to be normally distributed random variable.
  • \(U_2 = - A \sin \phi\) is often taken to be normally distributed random variable.
  • \(A = \sqrt{U_1^2 + U_2^2}\)
  • \(\phi = \tan^{-1} (-\frac{U_2}{U_1})\)


Read more »

Rigorous Probability (2)

Conditional Expectations

Elementary Conditional Probabilities

Definition 8.2 Conditional Probability

Let \((\Omega, \mathbb{A}, P)\) be a probability space adn \(B \in \mathbb{A}\). We define the conditional probability given \(B\) for any \(A \in \mathbb{A}\) by:

\[P(A | B) = \frac{P(A \cap B)}{P(B)}\]

with \(P(B) > 0\).


Theorem 8.4: \(P(\cdot | B)\) is a Probability Measure

If \(P(B) > 0\), then \(P(\cdot | B)\) is a probability measure on \((\Omega, \mathbb{A})\).


Theorem 8.5: Conditional Probability of Independent Events

Let \(A, B \in \mathbb{A}\) with \(P(A), P(B) > 0\). Then, belows are equivalent:

  1. \(A, B\) are independent.
  2. \(P(A | B) = P(A)\)
  3. \(P(B | A) = P(B)\)


Read more »

Measure Integral and Real Analysis (4)

Hilbert Spaces

Inner Product Spaces

Definition 8.1: Inner Product; Inner Product Space

An inner product on a vector space \(V\) is a function that takes each ordered pair \(f, g\) of elements of \(V\) to a number \(<f, g> \in \mathbb{F}\) and has the following properties:

  • Positivity: \[<f, f> \in [0, \infty), \; \forall f \in V\]
  • Definiteness: \[<f, f> = 0 \text{ IFF } f = 0\]
  • Linearity in first slot: \[<f + g, h> = <f, h> + <g, h>, \quad <\alpha f, g> = \alpha<f, g>, \quad \forall f,g,h \in V, \alpha \in \mathbb{F}\]
  • Conjugate Symmetry: \[<f, g> = \overline{<g, f>}, \; \forall f, g \in V\]

A vector space with inner product is called an inner product space.

If \(\mathbb{F} = \mathbb{R}\), then the complex conjugate can be ignored as the symmetry property.


Theorem 8.3: Basic Properties of an Inner Product

Suppose \(V\) is an inner product space. Then:

  1. \(<0, g> = <g, 0> = 0, \; \forall g \in V\)
  2. \(<f, g + h> = <f, g> + <f, h>, \; \forall f, g, h \in V\)
  3. \(<f, \alpha g> = \bar{\alpha} <f, g>, \; \forall \alpha \in \mathbb{F}, f, g \in V\)


Definition 8.4: Norm Associated with an Inner Product; \(\|\cdot\|\)

Suppose \(V\) is an inner product space. For \(f \in V\), define the norm of \(f\), denoted \(\|f\|\), by:

\[\|f\| = \sqrt{<f, f>}\]


Theorem 8.6: Homogeneity of the Norm

Suppose \(V\) is an inner product space, \(f \in V\), and \(\alpha \in \mathbb{F}\). Then:

\[\|\alpha f\| = |\alpha|\|f\|\]


Definition 8.7: Orthogonal

Two elements of an inner product space are called orthogonal if their inner product equals \(0\).


Definition 8.24: Distance from a Point to a Set

Suppose \(U\) is a nonempty subset of a normed vector space \(V\) and \(f \in V\). The distance from \(f\) to \(U\), denoted \(distance(f, U)\), is defined by:

\[distance(f, U) = \inf\{\|f - g\|: g \in U\}\]


Read more »

Measure Integral and Real Analysis (3)

Banach Spaces

Metric Spaces

Definition 6.1: Metric Space

A metric oon a nonempty set \(V\) is a function \(d: V \times V \rightarrow [0, \infty)\) s.t:

  • \(d(f, f) = 0, \; \forall f \in V\).
  • If \(f, g \in V\) and \(d(f, g) = 0\), then \(f = g\).
  • \(d(f, g) = d(g, f), \; \forall f, g \in V\).
  • \(d(f, h) \leq d(f, g) + f(g, h), \; \forall f, g, h \in V\). (Triangle inequality)

A metric space is a pair \((V, d)\), where \(V\) is a nonempty set adn \(d\) is a metric on \(V\).


Definition 6.3: Open Ball; \(B(f, r)\); Closed Ball \(\bar{B}(f, r)\)

Suppose \((V, d)\) is a metric space, \(f \in V\) and \(r > 0\):

  • The open ball centered at \(f\) with radius \(r\) is denoted \(B(f, r)\) and is defined by: \[B(f, r) = \{g \in V: d(f, g) < r\}\]
  • The closed ball centered at \(f\) with radius \(r\) is denoted \(\bar{B}(f, r)\) and is defined by: \[\bar{B}(f, r) = \{g \in V: d(f, g) \leq r\}\]


Definition 6.4: Open

A subset \(G\) of a metric space \(V\) is called open if for every \(f \in G\), there exists \(r > 0\) s.t \(B(f, r) \subseteq G\).


Theorem 6.5: Open balls are Open

Suppose \(V\) is a metric space, \(f \in V\), and \(r > 0\). Then \(B(f, r)\) is an open subset of \(V\).

Proof of Theorem 6.5:

Suppose \(g \in B(f, r)\). We need to show there exists an open ball centered at \(g\) with radius \(k\) s.t \(B(g, k) \subseteq B(f, r)\). Let \(k = r - d(f, g)\), and \(h \in B(g, r - d(f, g))\), then:

\[d(f, h) \leq d(f, g) + d(g, h) < d(f, g) + r - d(f, g) = r\]

Thus, \(h \in B(f, r), \; \forall h \in B(g, k)\), so \(B(f, r)\) is open.


Definition 6.6: Closed

A subset of a metric space \(V\) is called closed if its complement in \(V\) is open.


Definition 6.7: Closure; \(\bar{E}\)

Suppose \(V\) is a metric space and \(E \subseteq V\). The closure of \(E\), denoted \(\bar{E}\), is defined by:

\[\bar{E} = \{g \in V: B(g, \epsilon) \cap E \neq \emptyset \; \forall \epsilon > 0\}\]


Definition 6.8: Limit in Metric Space; \(\lim_{k \rightarrow \infty} f_k\)

Suppose \((V, d)\) is a metric space, \(f_1, f_2, ...\) is a sequence in \(V\), and \(f \in V\). Then:

\[\lim_{k\rightarrow \infty} f_k = f\]

means

\[\lim_{k \rightarrow \infty} d(f, f_k) = 0\]


Read more »

Rigorous Probability (1)

Preliminaries

For \(x, y \in \mathbb{\bar{R}} := \mathbb{R} \cup \{-\infty, \infty\}\), we agree on the following notation:

  1. \(x \vee y = \max (x, y)\)
  2. \(x \wedge y = \min(x, y)\)
  3. \(x^+ = \max(x, 0)\)
  4. \(x^- = \max(-x, 0)\)
  5. \(|x| = \max(x, -x) = x^- + x^+\)
  6. \(sign(x) = I_{x > 0} - I_{x < 0}\)

Class of Sets

Let \(\Omega \neq \emptyset\), let \(\mathbf{A} \subseteq 2^{\Omega}\) (set of all possible subsets of \(\Omega\)) be a class of subsets of \(\Omega\).

Definition 1.1: \(\cap\)-closed, \(\sigma-\cap\)-closed, \(\cup\)-closed, \(\sigma-\cup\)-closed, \(/\)-closed, \(A^c\)-closed

A class of sets \(\mathbf{A} \in 2^{\Omega}\) is called:

  • \(\cap\)-closed (closed under intersections) or \(\pi\)-system if \(A \cap B \in \mathbf{A}\), whenever \(A, B \in \mathbf{A}\).
  • \(\sigma-\cap-\)closed (closed under countable intersections) if \(\bigcap^{\infty}_{n=1}A_n \in \mathbf{A}\) for any choice of countably (finite or countably infinite) many sets \(A_1, A_2, ...., \in \mathbf{A}\).
  • \(\cup-\)closed (closed under unions) if \(A \cup B \in \mathbf{A}\), whenever \(A, B \in \mathbf{A}\).
  • \(\sigma-\cup-\)closed (closed under countable unions) if \(\bigcup^{\infty}_{n=1}A_n \in \mathbf{A}\) for any choice of countably (finite or countably infinite) many sets \(A_1, A_2, ...., \in \mathbf{A}\).
  • \(/-\)closed (closed under differences) if \(A / B \in \mathbf{A}\), whenever \(A, B \in \mathbf{A}\).
  • closed under complements if \(A^c := \Omega / A \in A\) for any set \(A \in \mathbf{A}\).


Definition 1.2: \(\sigma-\)algebra

A class of sets \(A \subseteq 2\) is called a \(\sigma\)-algebra if it fulfills the following three properties:

  1. \(\Omega \in \mathbf{A}\).
  2. \(\mathbf{A}\) is closed under complements.
  3. \(\mathbf{A}\) is closed under countable unions.

If \(\mathbf{A}\) is \(\sigma\)-algebra, we also have:

  • \(\mathbf{A}\) is \(\cup\)-closed \(\Longleftrightarrow\) \(\mathbf{A}\) is \(\cap\)-closed.
  • \(\mathbf{A}\) is \(\sigma-\cup\)-closed \(\Longleftrightarrow\) \(\mathbf{A}\) is \(\sigma-\cap\)-closed
  • \(\mathbf{A}\) is closed under differences
  • Any countable union of sets in \(\mathbf{A}\) cna be expressed as a countable disjoint union of sets in \(\mathbf{A}\)


Definition 1.6: Algebra

A class of sets \(\mathbf{A} \subseteq 2^{\pi}\) is called an algebra if the following three conditions are fulfilled:

  1. \(\Omega \in \mathbf{A}\).
  2. \(\mathbf{A}\) is \(/\)-closed.
  3. \(\mathbf{A}\) is \(\cup\)-closed.

If \(\mathbf{A}\) is Algebra, we also have:

  1. \(\mathbf{A}\) is closed under complements.
  2. \(\mathbf{A}\) is closed under intersections.


Read more »

Measure Integral and Real Analysis (2)

  1. \((A \times C) / (B \times D) = [A \times (C / D)] \cup [(A / B) \times C]\)
  2. \((A \cap B) \times (C \cap D) = (A \times C) \cap (B \times D)\)
  3. \((A \cup B) \times (C \cup D) \neq (A \times C) \cup (B \times D)\)

Integration

Integration w.r.t a Measure

Definition 3.1: \(S\)-Partition

Suppose \(S\) is a \(\sigma\)-algebra on a set \(X\). An \(S\)-partition of \(X\) is finite collection \(A_1, ..., A_m\) of disjoint sets in \(S\) s.t \(A_1 \cup ... \cup A_m = X\).


Definition 3.1: Lower Lebesgue Sum

Suppose \((X, S, \mu)\) is a measure space, \(f: X \rightarrow [0, \infty]\) is an \(S\)-measurable function, and \(P\) is an \(S\)-partition \(A_1, ..., A_m\) of \(X\). The lower Lebesgue sum \(L(f, P)\) is defined by:

\[L(f, P) = \sum^m_{j=1} \mu(A_j) \inf_{A_j} f\]

The definition does not assume \(X\) is a subset of \(\mathbb{R}\).


Definition 3.3: Integral of a Nonnegative Function

Suppose \((X, S, \mu)\) is a measure space and \(f: X \rightarrow [0, \infty]\) is an \(S\)-measurable function. The integral of \(f\) w.r.t \(\mu\), denoted by \(\int f d\mu\) is defined by:

\[\int f d\mu = \sup\{L(f, P): \text{ $P$ is an $S$-partition of $X$}\}\]


Theorem 3.4: Integral of a Characteristic Function \(\chi_E\)

Suppose \((X, S, \mu)\) is a measure space and \(E \in S\). Then:

\[\int \chi_E d\mu = \mu(E)\]

Read more »

Measure Integral and Real Analysis (1)

\(X / A\) means that the difference between set \(X\) and \(A\).

Fields

Complete Ordered Fields

Definition 0.1: Field

A field is a set \(\mathbb{F}\) along with closed operations of addition and multiplication on \(\mathbb{F}\) that have the following properties:

  1. Commutativity: \[a + b = b + a, \quad ab = ba \quad \forall a, b \in \mathbb{F}\]
  2. Associativity: \[(a + b) + c = a + (b + c) \quad (ab)c = a(bc) \quad \forall a,b,c \in \mathbb{F}\]
  3. Distributive Property: \[a(b + c) = ab + ac \quad \forall a, b, c \in \mathbb{F}\]
  4. Additive Identity: There exists an element \(0 \in \mathbb{F}\) s.t. \(a + 0 = a, \forall a \in \mathbb{F}\)
  5. Additive Inverse: For each \(a \in \mathbb{F}\), there exists an element \(-a \in \mathbb{F}\) such that \(a + (-a) = 0\).
  6. Multiplicative Identity: There exists an element \(1 \in \mathbb{F}\) s.t \(1 \neq 0\) and \(a1 = a, \forall a \in \mathbb{F}\).
  7. Multiplicative Inverse: For each \(a \in \mathbb{F}\) with \(a \neq 0\), there exists an element \(a^{-1} \in \mathbb{F}\) s.t \(aa^{-1} = 1\)

\(\mathbb{Q}, \mathbb{R}, \mathbb{C}, \{0, 1\}\) with usual operation of addition and multiplication are fields.


Read more »

Linear Algebra (5)

Operators on Complex Vector Spaces

Null Spaces of Powers of an Operator

Theorem 8.2: Sequence of Increasing Null Spaces

Suppose \(T \in L(V)\). Then

\[\{0\} = \text{null }T^0 \subseteq \text{null }T^1 \subseteq ... \subseteq \text{null }T^k \subseteq .... \]

where \(T^0 = I\) is the identity mapping in \(V\).

Proof of Theorem 8.2:

Suppose \(k\) is non-negative integer and \(v \in \text{null }T^k\), then \(T^k(v) = 0 \implies T^{k+1}(v) = TT^k(v) = T(0) = 0\), thus, \(\forall v \in \text{null }T^k\), \(v \in \text{null }T^{k+1}\).


Read more »