Decompose the joint probability density function into marginal and conditional densities.
Let $r_1, r_2, \dots, r_T$ be a series of $T$ returns.
We can explain that return time series with a specific parametric model, depending on a parameter vector $\theta$. This implies a joint probability function $f$ which reads:
$f(r_T, r_{T - 1}, \dots, r_2, r_1 \space ; \space \theta)$
Any joint probability density function is simply the product of conditional densities and one marginal density:
$$ f(r_T, r_{T - 1}, \dots, r_2, r_1 \space ; \space \theta) = [\prod_{t = 2}^{T}f(r_t \space | \space r_{t - 1, \dots , r_1 \space ; \space \theta}] \space \cdot \space f(r_1 \space ; \space \theta) $$
Decompose the joint probability density function of a markov process into marginal and conditional densities.
We can also decompose the joint probability function $f$ from D.1 into marginal and conditional densities, even if it is of an markov process:
$$ f(r_T, r_{T - 1}, \dots, r_2, r_1 \space ; \space \theta) \overbrace{=}^{markov} [\prod_{t = 2}^{T} f(r_t \space | \space r_{t - 1} \space ; \space \theta] \space \cdot \space f(r_1\space ; \space \theta) $$
State the asymptotic distribution of any Maximum Likelihood Estimate.
Let $\theta$ be our MLE estimator.
The asymptotic distribution in large samples (small samples are not well understood) is normally distributed: $\hat{\theta}_{ML} \sim N[\theta, \space \{I(\theta)^{-1}\}]$, where $I(\theta)$ is the Information Matrix.
That means that $\hat{\theta}_{ML}$ follows a normal distribution centered around the true population parameter $\theta$, having a variance that coincides with the inverse of the Information Matrix.