The goal of maximum likelihood is to find the parameter values that give the distribution that maximize the probability of observing the data.
Maximum likelihood estimation is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed.
Answers the example question: which values of $\mu$ and $\sigma^2$ maximizing the likelihood of observing the given data if we know, that distribution is $N(\mu, \sigma^2)$?
What we want to calculate is the total probability of observing all of the data, i.e. the joint probability distribution of all observed data points.
To do this we would need to calculate some conditional probabilities, which can get very difficult. So it is here that we’ll make our first assumption. The assumption is that each data point is generated independently of the others.
This assumption makes the maths much easier. If the events (i.e. the process that generates the data) are independent, then the total probability of observing all of data is the product of observing each data point individually (i.e. the product of the marginal probabilities).
Joint Likelihood function for some independent variables is a multiplication of the single likelihood function. Multiplication is expensive, numbers are getting huge soon and producing some numerical issues. Some Likelihood function (for Gaussian Garch(1,1)):
$$ L_T(\phi_0, \phi_1, \alpha_0, \alpha_1, \beta_1, \sigma_1) = \prod_{t=2}^T \frac{1}{\sqrt{ 2 \pi (\alpha_0 + \alpha_1 \epsilon^2_{t-1} + \beta_1 \sigma^2_{t-1})}} \times \exp\left( -\frac{(r_t - [\phi_0 + \phi_1 r_{t-1}])^2}{2 (\alpha_0 + \alpha_1 \epsilon^2_{t-1}+ \beta_1 \sigma^2_{t-1})} \right) $$
The good alternative is the Log of the Joint Likelihood. Important property of the log: log is monotonically increasing function. With Log we achieve two important things: