Assignment Generator :: Advanced Instructions
Scientists use different methods to identify bacteria such as microscopy and biochemical tests. A different approach uses the sequencing of the 16S rRNA gene. This gene is highly conserved in bacteria (and other organisms) and strongly correlated with their evolutionary history. Therefore, sequencing this gene provides a good idea of the identity of the bacteria without growing and isolating them. This methodology is highly valued and widely used in many scientific fields as diverse as medicine, plant pathology and geochemistry, among others.
1. Upload sequences to myRDP account
This sequence file is in fasta format that can be opened with text editors such as TextEdit on Mac OS X and Wordpad on PC. Do not use Notepad on PC to view the file as it will show the string of text on a single line. The format should appear as follows. The first line is devoted to the name of the sequence and always begins with a right caret followed by the name and a carriage return. The sequence begins on the second line and the sequence text soft wraps (no carriage return between lines) until the second and last carriage return of the file signifies the end of the sequence.
>ID "Student name" (carriage return)
SEQUENCE HERE…END (carriage return)
- Go to the Ribosomal Database Project at http://rdp.cme.msu.edu
- Log into your myRDP account, either a shared class myRDP account or your personal myRDP account, depending on the requirement from your instructor.
- Click “Upload” to Upload Sequences page.
- Choose “Bacteria 16S rRNA” as the gene for aligner, fill in the rest of the form and select your sequence fasta file. Click on the “Upload” button.
- On the Upload Sequences Confirmation page, click “Continue”.
- Upon the completion of upload, the Overview page will display the alignment status of your sequences, total (total), pending (pending for alignment), A (Aligned), F (Failed alignment) and U (Unaligned). Wait until the number of pending sequences becomes 0, which indicates all your sequences have been processed by RDP aligner.
2. Classify sequences
- On the myRDP Overview page, select your sequence group by clicking the
button in
front of your group. The selected sequences will be saved in the Sequence Cart. - Go to the Classifier analysis tool.
- Click on the "Do Classification with Selected Sequences" button.
- Record the classification information from Domain to Genus.
3. Identify the closest relatives to sequences
- Go to the Sequence Match analysis tool (also called SEQMATCH).
- Change the options below to:
- Strain: Both
- Source: Isolates
- Size: >1200
- Quality: Good
- Taxonomy: Nomenclatural
- KNN matches: 5
- Click on the "Do Seqmatch with Selected Sequences" button.
The Sequence Match results show the closest matches and how they fit in the nomenclatural taxonomy.
- Go to "view selectable matches." Only five matches should be displayed. If more than five matches are displayed, go to the bottom of the page and change the options as indicated above.
- Do all the matches have the same classification? If not, record the classification for the matches.
- Change the KNN Matches from 5 to 1. After this, only the closest match will be displayed. Record the name of the closest relative and its S_ab score.
In some cases the name of the sequence will not agree with the classification, that is ok and represents changes in the taxonomy, nomenclature, mislabeling, etc.
4. Select closest relatives, reference strains and outgroup
In order to construct a phylogenetic tree we need to select close relative of the unknown sequence and sequences from well characterized groups.
- Selecting closest isolated relative
- In the previous Seqmatch result page, go to the bottom of the page and change the KNN Matches from 5 to 1 at the Dataset options, verify the Isolate and >=1200 are selected, and then press the “Refresh”. After this, only the closest match will be displayed, in this case the closest isolated relative.
- Record the name of the closest relative and its similarity score (pink highlighted)
- Select the relative by checking on its box on the left, and then click on the “Save selection and return to summary” button.
- Selecting closest relative (isolated or uncultured)
- In the Dataset options change the Source option from “Isolates” to “Both” and then press the “Refresh" button. Now, isolated bacteria as well as yet uncultured ones in the database will be examined.
- Click on “view selectable matches”, and record the name of the closest relative and its similarity score (pink highlighted).
- Select the relative by checking on its box on the left, and then click on the “Save selection and return to summary" button.
- Selecting closest type strain
- In the Dataset options change the Strain option from “Both” to “Type” and then press the “Refresh" button. Now, only type strains will be displayed. Type strains are well characterized groups that link phylogeny with taxonomy.
- Click on “view selectable matches”, and record the name of the closest type strain relative and its similarity score (pink highlighted).
- Select the relative by checking on its box on the left, and then click on the “Save selection and return to summary" button.
- Select other reference and outgroup sequences
- Save these sequence IDs in a plain text file using a text editor:
- Go to the SEQCART
- Below Retrieve an existing file to current session, choose the sequence ID file to upload, Click on “Retrieve” button. These sequences will be automatically selected in Hierarchy Browser.
- Go to Browsers, and then click on the
button next to “Bacteria” to save the selection.
In some cases, the closest relatives will be also the closest isolated relative, so you do not need to add them again to the selection. In other cases, there will be a big difference in similarity between the isolated relative and uncultured ones. This could indicate that the group is hard to isolate in pure cultures.
It is possible that the closest type strain has been already selected. If that is the case, do not add it again to the selection.
S000414699 C. botulinum
S000260778 C. tetani
S000437209 C. acetobutylicum
S000437391 P. stutzeri
S000010427 P. aeruginosa
S000022197 P. fluorescens
S001020555 Archaea outgroup
5. Construct phylogenetic tree
Before constructing the tree make sure you have all the sequences you need. By this time you should have in your selection 9 to 11 sequences: your own sequence, three relatives (type, isolated, isolated or uncultured), six references, and the archaea outgroup. You can have less than three relatives if the closest sequence is also the closest type strains and/or the closest isolated relative.
- Go to Tree Builder.
- Under alignment model select the bacterial one..
- Select [S001020555] as the outgroup.
- Press the “create tree" button.
- Now the phylogenetic tree is shown. The numbers in each branching point are the bootstrap values.
- Save the tree by clicking first in the box where the tree is displayed and then press CTRL+P. The saved tree is in postscript format. For Mac users, clicking on the file will convert it automatically to a PDF file. PC Users need to use the link provided in the webpage to convert the file to a PDF (PSDPDF).
- Save the PDF file containing your phylogenetic tree and include it in your report.
Question
1. What is known about this genus in terms of ecological importance, usefulness to humans and its potential to cause diseases?
2. What is known about the reference strains selected in terms of ecological importance, usefulness to humans and its potential to cause diseases?
3. In the phylogenetic tree, how many groups of sequence can you identify?. Is your unknown sequence inside any of the reference strains groups.
4. In the tree topology, is there a big difference in bootstrap values within and between groups. If so, what could be a possible explanation for this difference.
Questions/comments: rdpstaff@msu.edu
