● What is homology?: homology is some similarity of biological organisms derived from the common ancestor. The characters of the DNA sequences share a common evolutionary history.
● Does sequence similarity induce homology?: No, the similar sequences can evolve independently. It called an evolutionar convergence.
● Orthology versus Paralogy versus Homology?:
Orthology: two genes in different organisms that have the same ancestor are called orthologous
Paralogy: two copies of the same gene in the same species, created by gene duplication, are called paralogous.
Homology: genes, that are derived from the common ancestor, are called homologous. Homology includes the orthology and pathology
● What's a gene duplication?: gene duplication is a duplication of a region of DNA that contains a gene inside one genome. This is a very frequent event.
● How can we assess the quality of an MSA?: We can use different criteria. The main criteria the Sum of Pairs measure.
● How do we compute the SP score?: SP score: sum of pairs score. Score each MSA site (column) and then add up the scores over all sites. Penalize the mismatches and gaps, give points for matches. However, the MSA with the maximal SP score does not guaranteed represent the true evolutionary history.
● What are MSAs good for? MSA is used to allign the DNA sequences of different organisms with each other. This alignment can be used for phylogenic reconstruction.
● Can we build an MSA with an optimal SP score?: Yes, we can use a n-dimensional dynamic programming, but it is requires the exponential time and space.
● What's the time complexity?: if all of the sequences have the length m, and there are n sequences, there are mmm*...*m n times operations and tables cells, also m^n time and space consumption.
● How does the star alignment heuristic work?: pick an optimal center sequence. Align all remaining sequences to the central sequence using a pairwise sequence alignment algorithm. The central sequence can be found by computing all optimal pair-wise alignments - for n sequences n^2 pair-wise alignments, and selecting the sequence with the largest similarity to all other sequences. This is an approximation algorithm, that produces the < 0.5*Optimum SP score. Not used in the practice.
● How is the tree alignment problem defined?: Evolutionary tree for the sequences is given - leaves of the evolutionary tree. Task: find an assignment of sequences to the inner nodes such that the sum over the similarity scores on all brances is maximized.
● Can you compute a tree alignment score on a given tree?: yes, calculate the differences between leaves and nodes and the differences betweeen the nodes, and calculate the sum. Linear complexity.
● How do practical approaches for MSA work?: Create pair-wise distance matrix. Then build a tree from thes pairwise distance matrix with the hierarchical clustering approaches. This tree is called guide tree. Then traverse this tree from the leaves to the root to bild the alignments in the nodes.