You are required to analyse a protein sequence by bioinformatics methods. What is its likely function? If it does not have a structure can you determine a model for the structure? What does the model tell you about its function? It might also be a “hypothetical” protein, where the structure has already been determined by a structural genomics consortium. In which case can you use sequence and known structure to determine protein function? You are not restricted to methods covered in the lectures, but you should focus on methods with the general aim of prediction of protein function and/or structure from sequence.
MSPSVEETTS VTESIMFAIV SFKHMGPFEG YSMSADRAAS DLLIGMFGSV SLVNLLTIIG
CLWVLRVTRP PVSVMIFTWN LVLSQFFSIL ATMLSKGIML RGALNLSLCR LVLFVDDVGL
YSTALFFLFL ILDRLSAISY GRDLWHHETR ENAGVALYAV AFAWVLSIVA AVPTAATGSL
DYRWLGCQIP IQYAAVDLTI KMWFLLGAPM IAVLANVVEL AYSDRRDHVW SYVGRVCTFY
VTCLMLFVPY YCFRVLRGVL QPASAAGTGF GIMDYVELAT RTLLTMRLGI LPLFIIAFFS
REPTKDLDDS FDYLVERCQQ SCHGHFVRRL VQALKRAMYS VELAVCYFST SVRDVAEAVK
KSSSRCYADA TSAAVVVTTT TSEKATLVEH AEGMASEMCP GTTIDVSAES SSVLCTDGEN
There should be an abstract of up to 250 words. An introduction, detailing what you have done, and why it is interesting, perhaps with a brief literature review if relevant. You are not expected to give details for the methods you have used, but do cite primary references if you use them. The remainder of the paper should be:
- Results and Discussion section giving relevant results and discussing their significance;
- Conclusions section where you review the significance of your results and comment on the usefulness of the methods used;
Marks will be awarded as follows:
- Abstract (5%) – awarded for a clear and concise abstract of the paper.
- Introduction (10%) – awarded for a clear introduction to the study and its motivation.
- Methods (5%) – awarded for the choice of a suitable number of relevant investigations. Do not include a literature review of the methods used.
- Results and Discussion (30%) – awarded for the clarity of the presentation of results, and the choice of an appropriate level of detail.
- Conclusions (20%) – awarded for a discussion showing theoretical insight into the methods chosen, the likely accuracy of any predictions, and the biological relevance of the results.
- References (10%) – awarded for appropriate and adequate use of references.
- Presentation (20%) – awarded for clear presentation in all sections. Over long papers will be penalised at 5%, just as they are when submitted to real scientific journals. Good marks will be obtained if the relevant information is given concisely, but with sufficient detail that the expert reader could repeat the investigations if necessary.
The assessment is open ended, and is therefore more like a mini research project. Here are some ideas about the sorts of things you might investigate:
- Searching protein sequence databases for related sequences using BLAST, FASTA or Smith-Waterman algorithms.
- Prediction of likely function of the sequence by similarity methods.
- Deduction of the domain structure of the sequence from the results of sequence searches.
- Analysis of the appearance of the sequence, or domains from it, in other organisms, or other kingdoms of life.
- Analysis of the sequence using PROSITE, PRINTS, BLOCKS or Pfam.
- Doing database searches with PSI-BLAST.
- Making multiple alignments of the sequence (or domains from it) with related sequences.
- Making phylogenetic trees based on multiple alignments.
- In the case of a sequence of known structure, searching for related structures.
- Prediction of secondary structure for the sequence, or a domain from it.
- Prediction of tertiary structure – Comparative Modelling.
- Prediction of tertiary structure – Fold Recognition.
- Prediction of trans-membrane segments.
- Prediction of protein-protein interactions.
Links for Us