● How do Bayesian methods differ from ML methods?: Bayesian methods finding parameter values that producing the best probability distibution, and the Maximum Likelihood searches for the parameter that maximizing the single likelihood. Therefore, the Bayesian Methods do not try to maximize the Likelihood, but integrate over it and find the best integral value.
● What are we trying to approximate?: We can obtain the probability of different tree topologies by integrating over the interval of each tree. But we can only use the Monte Carlo integration, to calculate it.
● What are the computational difficulties?: Monte Carlo integration calculates a normalized sum over large amount of sample points of the function. The classical Monte Carlo integration uses the random numbers. But the interval can be huge and have high dimensionality. Therefore good approximation with random sampling would require too much computations.
● How does MCMC work in principle?: Random sampling of the function values can be very runtime expensive. But we are not interested in all parts of distribution equally, but much more in the parts with high values of the funtion. MCMC helps us to sample the areas of interest more detailed. So, we construct a Marcov Chain that spents more steps in most interesting regions of the interval. The robot metaphor is commonly used to describe the MCMC. We drop a robot at some random place. Robot generates the direction proposal. We calculate the ratio between current point and proposed point. If slope goes up - always accept. If small down direction - usually accept. But almost never accept huge downhill steps.
● How do we compute if we want to accept or reject a proposal?: Decision to accept or reject a movement proposal is based on the ratio between the values of density function between current and proposed point.
● Why does this ratio solve a lot of problems?: It’s easy to compute, because of elimination of marginal probabilities.
● What's the difference between the proposal and the target distribution?: Proposal distibution describes how far and in which direction would the robot go. Proposal distibution with small variance means smaller steps with low rejection rate, and the proposal distibution with high variance generates longer steps, but also high rejection rate. The target distribution is the landscape, that we want to describe
● What does the term “good mixing” mean?: good mixing is a balanced proposal distribution.
● How can we summarize samples?: there are many different ways to do it. For example the simple means, weightet sums, histograms.
● How does MCMC work in practice for phylogenetics?: We choose a random tree with random branch lengths. Then for each iteration use the same idea as for robot to propose new tree topology with some of tree moves, or propose the new branch length for some branch and calculate the new likelihood. Iterate over and over, calculate acceptance ratio and accept/reject the proposed actions.
● What is thinning?: save for example only one tree for every thousand iterations, to avoid writing the terabytes of files.
● What is the Hastings ratio and why do we need it?: If we have a drunk robot - robot that makes more right moves than left moves for example, than the proposal distribution is incorrect. We can use the hastings ratio as a weight to compensate the distribution issues.