Gene analysis using molecular biology databases & online computational tools

Small tracts of DNA (less than 5 kb) can now be sequenced much more easily now than ever before. Even students in their formative years of science (2nd year) are capable of achieving this. This has come about due to the explosion in technology, knowledge and availability of kits resulting in cheaper costs and faster data throughput. The complete sequencing of 15 bacterial genomes in the last two years is testimony to this. To complement the high throughput genome sequencing projects, development of data bases and extensive software for data analysis have been developed. Using such tools in conjunction with molecular biology databases makes the internet an extremely powerful learning environment for a range of molecular studies. In addition, it provides useful facts and data for use in several industries.

The topic is designed to encourage and familiarize you with a range of online molecular biology tools freely and widely available via the internet for sequence analysis. For this, you have been given a DNA sequence /courses/bbs6017/sequence. You will use the online tools and molecular biology databases to answer a number of questions which are set out below. You should include the following points as part of your assignment submission:

  1. Identify the primary sequence of the deduced protein expressed by the gene using 6-frame translation and identify the translation frame.
  2. Does the sequence contain a start codon?
  3. What is the deduced molecular weight and the pI?
  4. If this protein was present in E. coli could you identify its location on an E. coli 2-D protein map?
  5. Which family / sub-family does it belong to? Mark the conserved amino acids found in the sequence that identifies it as belonging to the family / sub-family.
  6. Can you determine whether the deduced protein is secreted or cell associated?
  7. Align the deduced protein with 5 of its nearest neighbors. Determine the percent identity and homology. Differentiate between these two terms.
  8. Proteins are compared on the basis of their domain structures. Can you explain what the term means? Draw a domain structure model for the deduced protein and make a comparison with its relatives.
  9. Draw a 3-D model of the deduced protein.