C. Sequence Comparision and Alignment:

1. Computational problem with a pair of aligned sequences

There is a certain intuitive sense that two similar sequences can be lined up so that identical bases (or amino acids) are all matched

However from a computer's point of view the alignment process is far from trivial.

If gaps are allowed, there are a tremendous number of different alignments possible for any two sequences.

Seq 1: TT - ACTTGCC
Seq 2: ATGAC - - GAC

A "dynamic programming" method is usually able to produce a nearly optimal alignment by testing only a small subset of all possibilities.

By altering penalties for gaps vs. mismatches, it is possible to use this technique to give very good alignments without requiring tremendous computing power.

Also, remember that the "optimal" alignment according to the computer is often not the "correct" biological alignment.

2. Multiple Sequence Alignments

When the alignment problem is expanded to multiple sequences, we are once again confronted with a computationally huge problem.

Seq 1: TT - ACTTGCC
Seq 2: ATGAC - - GAC
Seq 3: CT - AGCCTGA

Rather than try to optimally align a bunch of sequences at once (dealing with nearly infinite permutations), a simplified "Heuristic" algorithm is used" known as "progressive alignment"
rank all sequences according to similarity to each other
align the most similar two sequences using the pairwise algorithm
create a consensus sequence from this pairwise alignment
take the next most similar sequence and align to the consensus, but remember to insert gaps in the individual sequences that make up the consensus
continue to add sequences until all are aligned

One result of this type of algorithm is that the order that the sequences are added to the alignment can affect the final outcome

Also, since the problem is so complex, it is quite difficult to mathematically define a truly optimal alignment of multiple sequences.

Bharat Patel, Biomolecular & Physical Sciences, Griffith University
Comments to: b.patel@griffith.edu.au