E. DNA sequencing

*** One skilled technician with an automated DNA sequencer can produce over 20 KB of raw sequence data per day. . Here is a sample of the data produced by an automated sequencer

***The real challenge of DNA sequencing is in the analysis of the data

*** DNA sequences are read in chunks of about 500 base pairs

*** Genes are typically tens of thousands of bp long, so these 500 bp reads must be overlapped and assembled into much longer segments know as "contigs"

*** Also, these 500 bp reads have errors of both incorrectly determined bases and insertions and deletions. Here is some sample data
*** Even worse is that the error rate is highest at the beginning and ends of the reads - precisely the regions that must be overlapped.
*** Another complication is that sequence from cloning vectors is often present at the ends of sequence reads.

***Since this is such a critical problem, a lot of effort has been put into developing software to aid DNA sequencing projects.

*** Based on their faith in the sequence assembly software, researchers have taken one of three different approaches to planning sequencing projects.
*** People who don't trust the software use a "directed cloning" strategy, carefully preparing ordered overlapping fragments.
*** A second strategy known as "primer walking" requires very fast and accurate analysis of sequence reads since each sequencing reaction uses information from the previous read.
*** A third strategy, know as "shotgun sequencing" takes maximum advantage of the speed and low cost of automated sequencing, but relies totally on software to assembly a jumble of sequence reads into a coherent and accurate contig.

***The Institute for Genomic Research (TIGR), has demonstrated the power and utility of the shotgun approach by determining the complete genomic sequences of Haemophilus influenzae , Methanococcus jannaschii , Mycoplasma genitalium, Archaeoglobus fulgidus, Deinococcus radiodurans, Thermotoga maritima.

Computers In Molecular Microbiology
Bharat Patel, School of Biomolecular & Biomedical Sciences, Griffith University
Comments to: B.Patel@griffith.edu.au