Assignment 2

In this assignment you have been given metagenomic sequence data. The data is from actual unpublished sequence generated from the sequenicng of inserts from BAC clones that were constructed from the DNA of a toxic cyanobacetrial bloom (Cylindrospermopsis species) collected from the North Pine River Dam. You are asked to identify and predict the functions of the many open reading frames (ORF) present in the sequenced region.


Click here to download the raw data 1 for this analysis

Click here to download the raw data 2 for this analysis


1) Identify genes using BlastX search

Perform a Blast search ( on the sequence contig using BlastX program and the nr database using the default parameters. Record the result you obtain (location and highest match for each gene identified)

2) Identify genes using GENEMARK HMM

Go to the Genemark homepage ( and click on ‘Heuristic approach for gene prediction in prokaryotes’. Paste your contig sequence in to the sequence text box. Under options make sure you select ‘Translate predicted genes into protein’. Then start the search. Record the results you obtain (location and size of each ORF).

3) Perform BLASTP searches on predicted ORF's

By copying and pasting the sequences, perform a Blast search using BlastP software for each predicted ORF that is greater than 75 amino acids in length. Record the result you obtain.

4) Choose 1 ORF - reconstruct metabolic pathway

Based on the BlastP search results, select one ORF that has a known function (not a hypothetical protein) and try to determine the metabolic pathways this gene product is involved in as follows-

The first step is to search against functional database in order to find what protein family your unknown ORF is in.

There are several databases that you can use for this purpose. Go to the PUMA homepage (PUMA/) and select similarity search from the top of the page. Choose from the several options and perform your analysis using any of the ORFs that you have identified .

There are alternate databases that you can also use. Some of these are listed below:


Once you have performed the analysis answer the following questions.

1) compare the protein matches detected using BLAST from raw sequence and BLAST from GENEMARK predicted ORF's, what differences are there in the number and types of genes detected?

2) What does this tell us about techniques for gene finding in genome sequences?

3) For those ORF's with a known function, describe what metabolic pathways each is involved in and the specific reaction the gene product catalyses. Do these ORF’s appear to be involved in one particular pathway and if so what is it? Draw a simple diagram of this metabolic pathway indicating what steps are catalysed by gene products identified in this analysis.

4) draw a schematic map of the genome region analysed and indicate the location, orientation and possible function of each ORF identified.

5) What conclusions can you draw about the metabolism from this region of the genome?

6) If there are any hypothetical or conserved unknown genes present, what possible metabolic function may these genes products be involved in

Comments and suggestions to: Professor Bharat Patel
[Created: 03 July 2001]
[Updated: 28 August 2007]