E. DNA sequencing
technician with an automated DNA sequencer can produce over 20 KB of raw
sequence data per day. . Here
is a sample of the data produced by an automated sequencer
The real challenge
of DNA sequencing is in the analysis of the data
are read in chunks of about 500 base pairs
Genes are typically
tens of thousands of bp long, so these 500 bp reads must be overlapped
and assembled into much longer segments know as "contigs"
500 bp reads have errors of both incorrectly determined bases and insertions
and deletions. Here is some sample
is that the error rate is highest at the beginning and ends of the reads
- precisely the regions that must be overlapped.
is that sequence from cloning vectors is often present at the ends of sequence
Since this is
such a critical problem, a lot of effort has been put into developing software
to aid DNA sequencing projects.
Based on their
faith in the sequence assembly software, researchers have taken one of
three different approaches to planning sequencing projects.
for Genomic Research (TIGR), has demonstrated the power and utility of
the shotgun approach by determining the complete genomic sequences of Haemophilus
influenzae , Methanococcus jannaschii , Mycoplasma genitalium,
Archaeoglobus fulgidus, Deinococcus radiodurans, Thermotoga maritima.
don't trust the software use a "directed cloning" strategy, carefully preparing
ordered overlapping fragments.
A second strategy
known as "primer walking" requires very fast and accurate analysis of sequence
reads since each sequencing reaction uses information from the previous
A third strategy,
know as "shotgun sequencing" takes maximum advantage of the speed and low
cost of automated sequencing, but relies totally on software to assembly
a jumble of sequence reads into a coherent and accurate contig.
Computers In Molecular Microbiology
Bharat Patel, School of Biomolecular & Biomedical Sciences, Griffith
Comments to: B.Patel@griffith.edu.au