kernels

Posted on 2022-06-12 Edited on 2022-06-14 In Background Views:

Symbols count in article: 939 Reading time ≈ 1 mins.

Kernels

kernel Functions

We define a kernel function to be a real-valued function of two arguments $k(\mathbf{x}, \mathbf{x}^\prime) = <\Phi(\mathbf{x}), \Phi(\mathbf{x}^\prime)>_H \in \mathbb{R}$, for $\mathbf{x}, \mathbf{x}^\prime \in \mathbb{X}$ that measures similarity between two inputs. Where $\Phi$ is maps into some inner product space $H$, sometimes called the feature space.

Positive Definite Kernel

It can be shown that a kernel that corresponds to an inner product in some inner product space coincides with the class of positive definite kernels.

Definition 1: Gram Matrix

Given a kernel $k$ and inputs $x_1, ...., x_n \in \mathbb{X}$, the $n \times n$ matrix:

\[K := (k(x_i, x_j))_{ij}\]

is called the Gram matrix or kernel matrix of $k$ w.r.t $x_1, ...., x_n$

Definition 2: Positive Definite Kernel

The function $k$ is called a positive definite kernel if:

\[\sum^N_{n=1} \sum^N_{m=1} a_n a_m k(\mathbf{x}_n, \mathbf{x}_m) \geq 0\]

for any real numbers $a_n, a_m$ and points $\mathbf{x}_n, \mathbf{x}_m \in \mathbb{X}$ and any $N \in \mathbb{N}$.

variational_inference

Posted on 2022-06-06 In ML Views:

Symbols count in article: 2.7k Reading time ≈ 2 mins.

Variational Inference: A Review for Statisticians

General Problem Setting:

Consider a joint density of latent variables $\mathbf{z} = z_{1:m}$ and observations $\mathbf{x} = x_{1:n}$:

\[p(\mathbf{z}, \mathbf{x}) = p(\mathbf{z}) p(\mathbf{x}|\mathbf{z})\]

The main idea behind variational inference is to use optimization to compute the conditional density of the latent variables given the observations:

\[p(\mathbf{z} | \mathbf{x}) = \frac{p(\mathbf{z}) p(\mathbf{x}|\mathbf{z})}{p(\mathbf{x})}\]

This conditional can be used to produce point or interval estimates of the latent variables, form predictive densities of new data $p(x^* | \mathbf{x})$ and more.

The quantity:

\[p(\mathbf{x}) = \int_\mathbf{z} p(\mathbf{z}, \mathbf{x}) d\mathbf{z}\]

is called the evidence, for many models, the evidence integral is unavailable in closed form or requires exponential time to compute.

Evidence Lower Bound

In variational inference, we specify a family $\Theta$ of densities over the latent variables. Each $q(\mathbf{z})$ is a candidate approximation to the exact conditional. Our goal is to find the best candidate, the one closest in KL divergence to the exact conditional. Inference now amounts to solving the following optimization problem:

\[q^*(\mathbf{z}) = \arg\min_{q(\mathbf{z}) \in \Theta} KL(q(\mathbf{z}) \;||\; p(\mathbf{z} | \mathbf{x})) = \int_{z} q(\mathbf{z})\log\frac{q(\mathbf{z})}{p(\mathbf{z}| \mathbf{x})} = E_{\mathbf{z} \sim q(\mathbf{z})}[\log q(\mathbf{z})] - E_{\mathbf{z} \sim q(\mathbf{z})}[\log p(\mathbf{z} | \mathbf{x})] = E_{\mathbf{z} \sim q(\mathbf{z})}[\log q(\mathbf{z})] - E_{\mathbf{z} \sim q(\mathbf{z})}[\log p(\mathbf{z}, \mathbf{x})] + \log p(\mathbf{x})\]

However, this objective is not computable because it requires computing the evidence $\log p(\mathbf{x})$. Since we cannot compute KL, we optimize an alternative objective that is equivalent ot the KL up to an added constant:

\[ELBO(q) = E_{\mathbf{z} \sim q(\mathbf{z})} [\log p(\mathbf{z}, \mathbf{x})] - E_{\mathbf{z} \sim q(\mathbf{z})}[\log q(\mathbf{z})]\]

This function is called the evidence lower bound (ELBO), ELBO is the negative KL divergence plus a constant $\log p(\mathbf{x})$ w.r.t $q(\mathbf{z})$. Maximizing ELBO is equivalent to minimizing the KL divergence.

Property 1: ELBO is the Balance Between Likelihood and Prior

Continue from above equation of $ELBO(q)$, we can rewrite it as ($E[\cdot] := E_{\mathbf{z} \sim q(\mathbf{z})}[\cdot]$):

\[ELBO(q) = E[\log p(\mathbf{z})] - E[\log q(\mathbf{z})] + E[\log p(\mathbf{x} | \mathbf{z})] = -KL(q(\mathbf{z}) | p(\mathbf{z})) + E[\log p(\mathbf{x} | \mathbf{z})]\]

In order to find a $q(\mathbf{z})$ that maximize ELBO, we have to maximize the expected log likelihood of the data given the hidden variables and minimize the distance to the prior (so the prior acts as a regularizer).

Property 2: ELBO is the Lower Bound of the Log Evidence

ELBO lower bounds the log evidence:

\[\log p(\mathbf{x}) = KL(q(\mathbf{z}) \;||\; p(\mathbf{z} | \mathbf{x})) \geq ELBO (q), \quad \quad \forall q \in \Theta\]

REF

https://arxiv.org/pdf/2205.14415.pdf

time-series-3

Posted on 2022-05-28 Edited on 2022-06-17 In Background Views:

Symbols count in article: 5.3k Reading time ≈ 5 mins.

Time Series (3)

Spectral Analysis and Filtering

We define a cycle as one complete period of a sine or cosine function defined over a unit time interval:

\[X_t = A\cos(2\pi\omega t + \phi) \quad \quad (4.1)\]

for $t = 0, \pm 1, \pm 2, ...$, where:

$\omega$ is a frequency index, defined in cycles per unit time, so if $\omega = 2$, for every 1 time unit, we have $2$ cycles, this is fixed.
$A$ determining the height or amplitude of the function, this is random.
$\phi$ determining the start point of the cosine function, this is random.

Using the trigonometric identity, we can write above equation as:

\[X_t = U_1 \cos (2 \pi \omega t) + U_2 \sin (2 \pi \omega t) \quad \quad (4.2)\]

Where:

$U_1 = A \cos \phi$ is often taken to be normally distributed random variable.
$U_2 = - A \sin \phi$ is often taken to be normally distributed random variable.
$A = \sqrt{U_1^2 + U_2^2}$
$\phi = \tan^{-1} (-\frac{U_2}{U_1})$

Rigorous Probability (2)

Posted on 2022-05-02 Edited on 2022-06-17 In Background Views:

Symbols count in article: 6.9k Reading time ≈ 6 mins.

Rigorous Probability (2)

Conditional Expectations

Elementary Conditional Probabilities

Definition 8.2 Conditional Probability

Let $(\Omega, \mathbb{A}, P)$ be a probability space adn $B \in \mathbb{A}$. We define the conditional probability given $B$ for any $A \in \mathbb{A}$ by:

\[P(A | B) = \frac{P(A \cap B)}{P(B)}\]

with $P(B) > 0$.

Theorem 8.4: $P(\cdot | B)$ is a Probability Measure

If $P(B) > 0$, then $P(\cdot | B)$ is a probability measure on $(\Omega, \mathbb{A})$.

Theorem 8.5: Conditional Probability of Independent Events

Let $A, B \in \mathbb{A}$ with $P(A), P(B) > 0$. Then, belows are equivalent:

$A, B$ are independent.
$P(A | B) = P(A)$
$P(B | A) = P(B)$

Measure Integral and Real Analysis (4)

Posted on 2022-03-30 Edited on 2022-06-17 In Background Views:

Symbols count in article: 19k Reading time ≈ 18 mins.

Measure Integral and Real Analysis (4)

Hilbert Spaces

Inner Product Spaces

Definition 8.1: Inner Product; Inner Product Space

An inner product on a vector space $V$ is a function that takes each ordered pair $f, g$ of elements of $V$ to a number $<f, g> \in \mathbb{F}$ and has the following properties:

Positivity: \[<f, f> \in [0, \infty), \; \forall f \in V\]
Definiteness: \[<f, f> = 0 \text{ IFF } f = 0\]
Linearity in first slot: \[<f + g, h> = <f, h> + <g, h>, \quad <\alpha f, g> = \alpha<f, g>, \quad \forall f,g,h \in V, \alpha \in \mathbb{F}\]
Conjugate Symmetry: \[<f, g> = \overline{<g, f>}, \; \forall f, g \in V\]

A vector space with inner product is called an inner product space.

If $\mathbb{F} = \mathbb{R}$, then the complex conjugate can be ignored as the symmetry property.

Theorem 8.3: Basic Properties of an Inner Product

Suppose $V$ is an inner product space. Then:

$<0, g> = <g, 0> = 0, \; \forall g \in V$
$<f, g + h> = <f, g> + <f, h>, \; \forall f, g, h \in V$
$<f, \alpha g> = \bar{\alpha} <f, g>, \; \forall \alpha \in \mathbb{F}, f, g \in V$

Definition 8.4: Norm Associated with an Inner Product; $\|\cdot\|$

Suppose $V$ is an inner product space. For $f \in V$, define the norm of $f$, denoted $\|f\|$, by:

\[\|f\| = \sqrt{<f, f>}\]

Theorem 8.6: Homogeneity of the Norm

Suppose $V$ is an inner product space, $f \in V$, and $\alpha \in \mathbb{F}$. Then:

\[\|\alpha f\| = |\alpha|\|f\|\]

Definition 8.7: Orthogonal

Two elements of an inner product space are called orthogonal if their inner product equals $0$.

Definition 8.24: Distance from a Point to a Set

Suppose $U$ is a nonempty subset of a normed vector space $V$ and $f \in V$. The distance from $f$ to $U$, denoted $distance(f, U)$, is defined by:

\[distance(f, U) = \inf\{\|f - g\|: g \in U\}\]

Measure Integral and Real Analysis (3)

Posted on 2022-03-14 Edited on 2022-06-17 In Background Views:

Symbols count in article: 20k Reading time ≈ 18 mins.

Measure Integral and Real Analysis (3)

Banach Spaces

Metric Spaces

Definition 6.1: Metric Space

A metric oon a nonempty set $V$ is a function $d: V \times V \rightarrow [0, \infty)$ s.t:

$d(f, f) = 0, \; \forall f \in V$.
If $f, g \in V$ and $d(f, g) = 0$, then $f = g$.
$d(f, g) = d(g, f), \; \forall f, g \in V$.
$d(f, h) \leq d(f, g) + f(g, h), \; \forall f, g, h \in V$. (Triangle inequality)

A metric space is a pair $(V, d)$, where $V$ is a nonempty set adn $d$ is a metric on $V$.

Definition 6.3: Open Ball; $B(f, r)$; Closed Ball $\bar{B}(f, r)$

Suppose $(V, d)$ is a metric space, $f \in V$ and $r > 0$:

The open ball centered at $f$ with radius $r$ is denoted $B(f, r)$ and is defined by: \[B(f, r) = \{g \in V: d(f, g) < r\}\]
The closed ball centered at $f$ with radius $r$ is denoted $\bar{B}(f, r)$ and is defined by: \[\bar{B}(f, r) = \{g \in V: d(f, g) \leq r\}\]

Definition 6.4: Open

A subset $G$ of a metric space $V$ is called open if for every $f \in G$, there exists $r > 0$ s.t $B(f, r) \subseteq G$.

Theorem 6.5: Open balls are Open

Suppose $V$ is a metric space, $f \in V$, and $r > 0$. Then $B(f, r)$ is an open subset of $V$.

Proof of Theorem 6.5:

Suppose $g \in B(f, r)$. We need to show there exists an open ball centered at $g$ with radius $k$ s.t $B(g, k) \subseteq B(f, r)$. Let $k = r - d(f, g)$, and $h \in B(g, r - d(f, g))$, then:

\[d(f, h) \leq d(f, g) + d(g, h) < d(f, g) + r - d(f, g) = r\]

Thus, $h \in B(f, r), \; \forall h \in B(g, k)$, so $B(f, r)$ is open.

Definition 6.6: Closed

A subset of a metric space $V$ is called closed if its complement in $V$ is open.

Definition 6.7: Closure; $\bar{E}$

Suppose $V$ is a metric space and $E \subseteq V$. The closure of $E$, denoted $\bar{E}$, is defined by:

\[\bar{E} = \{g \in V: B(g, \epsilon) \cap E \neq \emptyset \; \forall \epsilon > 0\}\]

Definition 6.8: Limit in Metric Space; $\lim_{k \rightarrow \infty} f_k$

Suppose $(V, d)$ is a metric space, $f_1, f_2, ...$ is a sequence in $V$, and $f \in V$. Then:

\[\lim_{k\rightarrow \infty} f_k = f\]

means

\[\lim_{k \rightarrow \infty} d(f, f_k) = 0\]

Rigorous Probability (1)

Posted on 2022-03-10 Edited on 2022-06-17 In Background Views:

Symbols count in article: 38k Reading time ≈ 34 mins.

Rigorous Probability (1)

Preliminaries

For $x, y \in \mathbb{\bar{R}} := \mathbb{R} \cup \{-\infty, \infty\}$, we agree on the following notation:

$x \vee y = \max (x, y)$
$x \wedge y = \min(x, y)$
$x^+ = \max(x, 0)$
$x^- = \max(-x, 0)$
$|x| = \max(x, -x) = x^- + x^+$
$sign(x) = I_{x > 0} - I_{x < 0}$

Class of Sets

Let $\Omega \neq \emptyset$, let $\mathbf{A} \subseteq 2^{\Omega}$ (set of all possible subsets of $\Omega$) be a class of subsets of $\Omega$.

Definition 1.1: $\cap$-closed, $\sigma-\cap$-closed, $\cup$-closed, $\sigma-\cup$-closed, $/$-closed, $A^c$-closed

A class of sets $\mathbf{A} \in 2^{\Omega}$ is called:

$\cap$-closed (closed under intersections) or $\pi$-system if $A \cap B \in \mathbf{A}$, whenever $A, B \in \mathbf{A}$.
$\sigma-\cap-$closed (closed under countable intersections) if $\bigcap^{\infty}_{n=1}A_n \in \mathbf{A}$ for any choice of countably (finite or countably infinite) many sets $A_1, A_2, ...., \in \mathbf{A}$.
$\cup-$closed (closed under unions) if $A \cup B \in \mathbf{A}$, whenever $A, B \in \mathbf{A}$.
$\sigma-\cup-$closed (closed under countable unions) if $\bigcup^{\infty}_{n=1}A_n \in \mathbf{A}$ for any choice of countably (finite or countably infinite) many sets $A_1, A_2, ...., \in \mathbf{A}$.
$/-$closed (closed under differences) if $A / B \in \mathbf{A}$, whenever $A, B \in \mathbf{A}$.
closed under complements if $A^c := \Omega / A \in A$ for any set $A \in \mathbf{A}$.

Definition 1.2: $\sigma-$algebra

A class of sets $A \subseteq 2$ is called a $\sigma$-algebra if it fulfills the following three properties:

$\Omega \in \mathbf{A}$.
$\mathbf{A}$ is closed under complements.
$\mathbf{A}$ is closed under countable unions.

If $\mathbf{A}$ is $\sigma$-algebra, we also have:

$\mathbf{A}$ is $\cup$-closed $\Longleftrightarrow$ $\mathbf{A}$ is $\cap$-closed.
$\mathbf{A}$ is $\sigma-\cup$-closed $\Longleftrightarrow$ $\mathbf{A}$ is $\sigma-\cap$-closed
$\mathbf{A}$ is closed under differences
Any countable union of sets in $\mathbf{A}$ cna be expressed as a countable disjoint union of sets in $\mathbf{A}$

Definition 1.6: Algebra

A class of sets $\mathbf{A} \subseteq 2^{\pi}$ is called an algebra if the following three conditions are fulfilled:

$\Omega \in \mathbf{A}$.
$\mathbf{A}$ is $/$-closed.
$\mathbf{A}$ is $\cup$-closed.

If $\mathbf{A}$ is Algebra, we also have:

$\mathbf{A}$ is closed under complements.
$\mathbf{A}$ is closed under intersections.

Measure Integral and Real Analysis (2)

Posted on 2022-02-27 Edited on 2022-03-12 In Background Views:

Symbols count in article: 32k Reading time ≈ 29 mins.

Measure Integral and Real Analysis (2)

$(A \times C) / (B \times D) = [A \times (C / D)] \cup [(A / B) \times C]$
$(A \cap B) \times (C \cap D) = (A \times C) \cap (B \times D)$
$(A \cup B) \times (C \cup D) \neq (A \times C) \cup (B \times D)$

Integration

Integration w.r.t a Measure

Definition 3.1: $S$-Partition

Suppose $S$ is a $\sigma$-algebra on a set $X$. An $S$-partition of $X$ is finite collection $A_1, ..., A_m$ of disjoint sets in $S$ s.t $A_1 \cup ... \cup A_m = X$.

Definition 3.1: Lower Lebesgue Sum

Suppose $(X, S, \mu)$ is a measure space, $f: X \rightarrow [0, \infty]$ is an $S$-measurable function, and $P$ is an $S$-partition $A_1, ..., A_m$ of $X$. The lower Lebesgue sum $L(f, P)$ is defined by:

\[L(f, P) = \sum^m_{j=1} \mu(A_j) \inf_{A_j} f\]

The definition does not assume $X$ is a subset of $\mathbb{R}$.

Definition 3.3: Integral of a Nonnegative Function

Suppose $(X, S, \mu)$ is a measure space and $f: X \rightarrow [0, \infty]$ is an $S$-measurable function. The integral of $f$ w.r.t $\mu$, denoted by $\int f d\mu$ is defined by:

\[\int f d\mu = \sup\{L(f, P): \text{ $P$ is an $S$-partition of $X$}\}\]

Theorem 3.4: Integral of a Characteristic Function $\chi_E$

Suppose $(X, S, \mu)$ is a measure space and $E \in S$. Then:

\[\int \chi_E d\mu = \mu(E)\]

Measure Integral and Real Analysis (1)

Posted on 2022-02-13 Edited on 2022-03-08 In Background Views:

Symbols count in article: 40k Reading time ≈ 36 mins.

Measure Integral and Real Analysis (1)

$X / A$ means that the difference between set $X$ and $A$.

Fields

Complete Ordered Fields

Definition 0.1: Field

A field is a set $\mathbb{F}$ along with closed operations of addition and multiplication on $\mathbb{F}$ that have the following properties:

Commutativity: \[a + b = b + a, \quad ab = ba \quad \forall a, b \in \mathbb{F}\]
Associativity: \[(a + b) + c = a + (b + c) \quad (ab)c = a(bc) \quad \forall a,b,c \in \mathbb{F}\]
Distributive Property: \[a(b + c) = ab + ac \quad \forall a, b, c \in \mathbb{F}\]
Additive Identity: There exists an element $0 \in \mathbb{F}$ s.t. $a + 0 = a, \forall a \in \mathbb{F}$
Additive Inverse: For each $a \in \mathbb{F}$, there exists an element $-a \in \mathbb{F}$ such that $a + (-a) = 0$.
Multiplicative Identity: There exists an element $1 \in \mathbb{F}$ s.t $1 \neq 0$ and $a1 = a, \forall a \in \mathbb{F}$.
Multiplicative Inverse: For each $a \in \mathbb{F}$ with $a \neq 0$, there exists an element $a^{-1} \in \mathbb{F}$ s.t $aa^{-1} = 1$

$\mathbb{Q}, \mathbb{R}, \mathbb{C}, \{0, 1\}$ with usual operation of addition and multiplication are fields.

Linear Algebra (5)

Posted on 2022-02-04 Edited on 2022-03-06 In Background Views:

Symbols count in article: 24k Reading time ≈ 22 mins.

Linear Algebra (5)

Operators on Complex Vector Spaces

Null Spaces of Powers of an Operator

Theorem 8.2: Sequence of Increasing Null Spaces

Suppose $T \in L(V)$. Then

\[\{0\} = \text{null }T^0 \subseteq \text{null }T^1 \subseteq ... \subseteq \text{null }T^k \subseteq .... \]

where $T^0 = I$ is the identity mapping in $V$.

Proof of Theorem 8.2:

Suppose $k$ is non-negative integer and $v \in \text{null }T^k$, then $T^k(v) = 0 \implies T^{k+1}(v) = TT^k(v) = T(0) = 0$, thus, $\forall v \in \text{null }T^k$, $v \in \text{null }T^{k+1}$.

Kernels

kernel Functions

Positive Definite Kernel

Definition 1: Gram Matrix

Definition 2: Positive Definite Kernel

Variational Inference: A Review for Statisticians

General Problem Setting:

Evidence Lower Bound

Property 1: ELBO is the Balance Between Likelihood and Prior

Property 2: ELBO is the Lower Bound of the Log Evidence

REF

Time Series (3)

Spectral Analysis and Filtering

Rigorous Probability (2)

Conditional Expectations

Elementary Conditional Probabilities

Definition 8.2 Conditional Probability

Theorem 8.4: \(P(\cdot | B)\) is a Probability Measure

Theorem 8.5: Conditional Probability of Independent Events

Measure Integral and Real Analysis (4)

Hilbert Spaces

Inner Product Spaces

Definition 8.1: Inner Product; Inner Product Space

Theorem 8.3: Basic Properties of an Inner Product

Definition 8.4: Norm Associated with an Inner Product; \(\|\cdot\|\)

Theorem 8.6: Homogeneity of the Norm

Definition 8.7: Orthogonal

Definition 8.24: Distance from a Point to a Set

Measure Integral and Real Analysis (3)

Banach Spaces

Metric Spaces

Definition 6.1: Metric Space

Definition 6.3: Open Ball; \(B(f, r)\); Closed Ball \(\bar{B}(f, r)\)

Definition 6.4: Open

Theorem 6.5: Open balls are Open

Proof of Theorem 6.5:

Definition 6.6: Closed

Definition 6.7: Closure; \(\bar{E}\)

Definition 6.8: Limit in Metric Space; \(\lim_{k \rightarrow \infty} f_k\)

Rigorous Probability (1)

Preliminaries

Class of Sets

Definition 1.1: \(\cap\)-closed, \(\sigma-\cap\)-closed, \(\cup\)-closed, \(\sigma-\cup\)-closed, \(/\)-closed, \(A^c\)-closed

Definition 1.2: \(\sigma-\)algebra

Definition 1.6: Algebra

Measure Integral and Real Analysis (2)

Integration

Integration w.r.t a Measure

Definition 3.1: \(S\)-Partition

Definition 3.1: Lower Lebesgue Sum

Definition 3.3: Integral of a Nonnegative Function

Theorem 3.4: Integral of a Characteristic Function \(\chi_E\)

Measure Integral and Real Analysis (1)

Fields

Complete Ordered Fields

Definition 0.1: Field

Linear Algebra (5)

Operators on Complex Vector Spaces

Null Spaces of Powers of an Operator

Theorem 8.2: Sequence of Increasing Null Spaces

Proof of Theorem 8.2: