Assignment Generator :: Basic Instructions
Scientists use different methods to identify bacteria such as microscopy and biochemical tests. A different approach uses the sequence of the 16S rRNA gene. This gene is highly conserved in bacteria (and other organisms) and is strongly correlated with their evolutionary history. By sequencing many segments of the rRNA genes present in environmental DNA, we can determine what well-studied organisms are related to the microorganisms present in an environment. This methodology is widely used in scientific fields as diverse as medicine, plant pathology and geochemistry, among others.
In this exercise, the student will use two different bioinformatics methods to estimate the taxonomic placement of unknown microorganism from their rRNA gene sequences.
1. Obtain your unknown sequences
Your instructor will tell you how to obtain a file containing your rRNA gene sequences. This file is in FASTA format that can be opened with text editors such as TextEdit on Mac OS X and WordPad on PC. (If you have difficulty opening your file with the latest Mac OS 10.9, from the Finder double click on the file and choose to open it with TextEdit. If you still have trouble, change your file extension to .txt. Do not use Notepad on PC to view the file as it will show the string of text on a single line.)
The FASTA format is a very common sequence file format and should appear as follows: The first line is devoted to the name of the sequence and always begins with a right caret followed by the name and a hard line return. The sequence begins on the second line and the sequence text soft wraps (no hard line return between lines) until the second and last line return of the file signifies the end of the sequence.
>SEQUENCE_ID (hard line return)
SEQUENCE HERE…END (hard line return)
2. Classify your sequences using the SeqMatch k-nearest-neighbor classifier
- Go to the Ribosomal Database Project at http://rdp.cme.msu.edu
- Go to the Sequence Match analysis tool (also called SeqMatch).
- Go to "Choose a file to upload" and select your fasta file.
- Change the options below to:
- Strain: Both
- Source: Isolates
- Size: >1200
- Quality: Good
- Taxonomy: Nomenclatural
- KNN matches: 5
- Click on "Submit".
The Sequence Match results show the closest matches and how they fit in the nomenclatural taxonomy.
- For each query, record the classification results from domain to the lowest displayed rank for that query. (Usually genus, but sometimes a higher rank will be the lowest displayed.)
- For each query in turn, choose "view selectable matches".
- Change “KNN Matches” from 5 to 1. Now only the closest match will be displayed. Record the name of the closest relative and its S_ab score. (Note: In some cases the full name may contain a species name in conflict with the taxonomic assignment, including misclassification by the classifier or between the original nomenclature and a phylogenetically consistent taxonomic assessment.)
3. Classify sequences using the RDP naïve Bayesian Classifier
- Go to the Ribosomal Database Project at http://rdp.cme.msu.edu.
- Go to the Classifier analysis tool.
- Go to "Choose a file to upload" and select your FASTA file. Then click "Submit".
- For each unknown, record the classification information from Domain to Genus.
- Do the SeqMatch and Bayesian Classifier results match? If not, how do they differ?
- Do the SeqMatch and Bayesian Classifier assignments match? Why might they differ?
- Based on the taxonomic assignments, what can you infer about your unknowns in terms of ecological importance, usefulness to humans and potential to cause disease?