● Name the two basic classes of reconstruction methods!: distance based and character based reconstruction methods.
● How does UPGMA work in principle?: UPGMA works with the distance matrix. Find the minimum value in the matrix, and merge the corresponding sequences in the subtree with equal branch lengths. Update the distance matrix and continue, until you get the matrix with one number. Time is cubic, because you have no find a minimum value from n^2 cells n times. This time can be reduced with some intelligent data structures. UPGMA produces the ultrametric trees, if the original tree is not ultrametric - and most trees are not ultrametric - UPGMA will produce garbage.
● How does do Neighbor Joining work in principle?: Neighbor Joining works with the distance matrix. We find a two sequences with minimal distances, and create a subtree with these two sequences and one node. Rebuild the matrix, to take this node in account insted of the original sequences. Time complexity is O(n^3), space complexity is O(n^2), because we have to store the matrix in memory.
● Name some methods that are NP-hard: Least-Squares, Parsimony, Likelihood
● How does the least-squares algorithm work?: If the tree topology is given, we can find the optimal branch length. We have to find such values for branch lengths, that are minimizing the difference between the distance matrix and tree topology. So, the least squares score for the given tree can be efficiently found.
It can be used to find the tree topology with minimal least squares score. But the number of the possible topologies is exponential.
● How is the minimum evolution criterion defined?: The minimum evolution method is similar to least squares method. We compute the least squre score for the each tree topology, but make a final choise based on the minimal sum of branch lengths.
● How does parsimony work?: parsimony is a character based three building method. It operates directly on the MSA. It tryes to find the tree that explains the data with the least amount of mutations.
● What's the time and space complexity for computing the parsimony score on a tree?: When we are comparing n sequences with the length m, then we need traverse all the nodes for each site - something like O(nm) time. We are also need to store all sequences in memory - it is O(nm) place.