Week 09 (Nonlinear Learning)

D.1 Decomposing Joint Probabilities

Decompose the joint probability density function into marginal and conditional densities.

Let $r_1, r_2, \dots, r_T$ be a series of $T$ returns.

We can explain that return time series with a specific parametric model, depending on a parameter vector $\theta$. This implies a joint probability function $f$ which reads:

$f(r_T, r_{T - 1}, \dots, r_2, r_1 \space ; \space \theta)$

Any joint probability density function is simply the product of conditional densities and one marginal density:

$$ f(r_T, r_{T - 1}, \dots, r_2, r_1 \space ; \space \theta) = [\prod_{t = 2}^{T}f(r_t \space | \space r_{t - 1, \dots , r_1 \space ; \space \theta}] \space \cdot \space f(r_1 \space ; \space \theta) $$

D.2 Decomposing Joint Probabilities of Markov Processes

Decompose the joint probability density function of a markov process into marginal and conditional densities.

We can also decompose the joint probability function $f$ from D.1 into marginal and conditional densities, even if it is of an markov process:

$$ f(r_T, r_{T - 1}, \dots, r_2, r_1 \space ; \space \theta) \overbrace{=}^{markov} [\prod_{t = 2}^{T} f(r_t \space | \space r_{t - 1} \space ; \space \theta] \space \cdot \space f(r_1\space ; \space \theta) $$

D.3 Asymptotics of MLE

State the asymptotic distribution of any Maximum Likelihood Estimate.

Let $\theta$ be our MLE estimator.

The asymptotic distribution in large samples (small samples are not well understood) is normally distributed: $\hat{\theta}_{ML} \sim N[\theta, \space \{I(\theta)^{-1}\}]$, where $I(\theta)$ is the Information Matrix.

That means that $\hat{\theta}_{ML}$ follows a normal distribution centered around the true population parameter $\theta$, having a variance that coincides with the inverse of the Information Matrix.