● Name some distances for comparing strings: Hamming distance (for strings of the same length), Levenstein Distanse (edit distance).
● What is the difference between local and global pair-wise sequence alignment?: Global alignment: given two strings, we want to align the whole strings. Local alignment: we have two long strings and we want to find similar substrings and align them.
● How is the edit distance defined?: minimum number of delete, insert or replace operations to transform string x into string y.
● What's the definition of the Hamming distance?: Hamming distanse counts the number of characters on the same positions in the word that differ.
● How do we define an optimal pair-wise sequence alignment?: Alignment with minimal Hamming distance.
● Outline how a pair-wise sequence alignment algorithm that uses dynamic programming works!: Needleman-Wunsch algorith: Filling the table with the minimal edit distances between substrings. Initial state uses two empty strings, and increase the string size step by step. The final minimal edit distance is stored in bottom right corner of the table. We also need to find an optimal path through the table.
● What's the difference between the Needleman-Wunsch and Smith-Waterman algorithms?: Needleman-Wunsch finds the minimal edit distanse between whole strings, and Smith-Waterman finds the most similar substring between two strings. Therefore, Needleman-Wunsch finds the optimal global alignment, and the Smith-Waterman finds optimal local alignment.
● What is their time and space complexity?: They are both dynamic programming algorithms, based on the filling a two dimentional table. So, they need a space for the table - O(nm) space. Both algorithms are performing one calculation pro cell - also O(nm) time complexity.
● What is a substitution matrix?: Not all substitutions are equally likely. We know the probabilities of the letter substitution and take them into account. Substitution matrix contains the positive and negative values thar represent more likely and less likely substitutions.