E. DNA sequencing
One skilled technician with an automated DNA sequencer can produce over 20 KB of raw sequence data per day. . Here is a sample of the data produced by an automated sequencer
The real challenge of DNA sequencing is in the analysis of the data
DNA sequences are read in chunks of about 500 base pairs
Genes are typically tens of thousands of bp long, so these 500 bp reads must be overlapped and assembled into much longer segments know as "contigs"
Also, these 500 bp reads have errors of both incorrectly determined bases and insertions and deletions. Here is some sample data
Even worse is that the error rate is highest at the beginning and ends of the reads - precisely the regions that must be overlapped.
Another complication is that sequence from cloning vectors is often present at the ends of sequence reads.
Since this is such a critical problem, a lot of effort has been put into developing software to aid DNA sequencing projects.
Based on their faith in the sequence assembly software, researchers have taken one of three different approaches to planning sequencing projects.
People who don't trust the software use a "directed cloning" strategy, carefully preparing ordered overlapping fragments.
A second strategy known as "primer walking" requires very fast and accurate analysis of sequence reads since each sequencing reaction uses information from the previous read.
A third strategy, know as "shotgun sequencing" takes maximum advantage of the speed and low cost of automated sequencing, but relies totally on software to assembly a jumble of sequence reads into a coherent and accurate contig.
The Institute for Genomic Research (TIGR), has demonstrated the power and utility of the shotgun approach by determining the complete genomic sequences of Haemophilus influenzae , Methanococcus jannaschii , Mycoplasma genitalium, Archaeoglobus fulgidus, Deinococcus radiodurans, Thermotoga maritima.
Computers In Molecular Microbiology
Bharat Patel, School of Biomolecular & Biomedical Sciences, Griffith University
Comments to: bharat@trishul.sci.gu.edu.au