Assign Gen Help

From Ribosomal Database Project Wiki
Jump to: navigation, search

Contents

How to Run the Assignment Generator Tool?

Download unknown sequences

This file is in fasta format that can be opened with text editors such as TextEdit on Mac OS X and Wordpad on PC. Do not use Notepad on PC to view the file as it will show the string of text on a single line. The format should appear as follows: The first line is devoted to the name of the sequence and always begins with a right caret followed by the name and a carriage return. The sequence begins on the second line and the sequence text soft wraps (no carriage return between lines) until the second and last carriage return of the file signifies the end of the sequence. (NOTE: an error in this format is the most frequent cause of files failing to upload, so be sure to check this before contacting RDP)

>ID "Student name" (hard line return)
SEQUENCE HEREā€¦END (hard line return)

Classify sequences into the taxonomy

  • Go to the Ribosomal Database Project at http://rdp.cme.msu.edu
  • Go to the Classifier analysis tool.
  • Go to "Choose a file to upload" and select your fasta file. Then click on the "Submit" button.
  • Record the classification information from Domain to Genus. It is important to know whether your sequence is an Archaea or a Bacteria since their secondary structure models are different.

Upload sequences to your myRDP account

  • Log into your myRDP account, either a shared class myRDP account or your personal myRDP account, depending on the requirement from your instructor.
  • Click "Upload" for the Upload Sequences page.
  • Choose "Bacteria 16S rRNA" or "Archaea 16S rRNA" as the gene for aligner depending on your previous Classifier results, fill in the rest of the form and select your sequence fasta file. *Click on the "Upload" button.
  • On the Upload Sequences Confirmation page, click "Continue".
  • Your sequences will show that they are "INITIALIZING" for a few minutes (you might want to Reload the browser page to see the results if this procedure takes too long).
  • Wait until the number of pending sequences becomes 0, which indicates all your sequences have been processed by RDP aligner.
  • Click on the Group name to see your results in a "View Group List" page.
  • Upon the completion of upload, the "View Group List" page will display the alignment status of your sequences, total (total), pending (pending for alignment), A (Aligned), F (Failed alignment) and U (Unaligned).
  • Record the classification information from Domain to Genus. (Explore some of the details by clicking the VIEW CLASSIFICATION button and clicking on the seqname.)

Identify the closest relatives to sequences

  • Go to the Sequence Match analysis tool (also called SeqMatch).
  • Change the options below to:
Strain: Both
Source: Isolates
Size: >1200
Quality: Good
Taxonomy: Nomenclatural
KNN matches: 5
  • Click on the "Do SeqMatch with Selected Sequences" button. The Sequence Match results show the closest matches and how they fit in the nomenclatural taxonomy.
  • Go to "view selectable matches." Only five matches should be displayed. If more than five matches are displayed, go to the bottom of the page and change the options as indicated above.
  • Do all the matches have the same classification? If not, record the classification for the matches.
  • Change the KNN Matches from 5 to 1. After this, only the closest match will be displayed. Record the name of the closest relative and its S_ab score.

In some cases the name of the sequence will not agree with the classification, that is ok and represents changes in the taxonomy, nomenclature, mislabeling, etc.

Select closest relatives, reference strains and outgroup

In order to construct a phylogenetic tree we need to select close relatives of the unknown sequence and sequences from well characterized groups.

Selecting closest isolated relative
  • In the previous SeqMatch result page, go to the bottom of the page and change the KNN Matches from 5 to 1 at the Dataset options, verify that Isolate and >=1200 are selected, and then press "Refresh". After this, only the closest match will be displayed, in this case the closest isolated relative.
  • Record the name of the closest relative and its similarity score (pink highlighted).
  • Select the relative by checking on its box on the left, and then click on the "Save selection and return to summary" button.
Selecting closest relative (isolated or uncultured)
  • In the Dataset options change the Source option from "Isolates" to "Both" and then press the "Refresh" button. Now, isolated bacteria as well as yet uncultured ones in the database will be examined.
  • Click on "view selectable matches", and record the name of the closest relative and its similarity score (pink highlighted).
  • Select the relative by checking on its box on the left, and then click on the "Save selection and return to summary" button.
  • In some cases, the closest relatives will also be the closest isolated relative, so you do not need to add them again to the selection. In other cases, there will be a big difference in similarity between the isolated relative and uncultured ones. This could indicate that the group is hard to isolate in pure cultures.
Selecting closest type strain
  • In the Dataset options change the Strain option from "Both" to "Type" and then press the "Refresh" button. Now, only type strains will be displayed. Type strains are well characterized groups that link phylogeny with taxonomy.
  • Click on "view selectable matches", and record the name of the closest type strain relative and its similarity score (pink highlighted).
  • Select the relative by checking on its box on the left, and then click on the "Save selection and return to summary" button.
  • It is possible that the closest type strain has already been selected. If that is the case, do not add it again to the selection.
Select other reference and outgroup sequences

Sequences, especially type strain ones, help link the taxonomy with the phylogeny and are also good guides to determine the quality of the phylogenetic tree.

  • Save these sequence IDs (only the codes and NOT the names) in a plain text file using a text editor:

L37585 C. botulinum
X74770 C. tetani
U16166 C. acetobutylicum
U26262 P. stutzeri
X06684 P. aeruginosa
Z76662 P. fluorescens
AF242652 M. acididurans
X99560 S. marinus
_L77117 Archaea outgroup
_J01695 Bacterial outgroup E. coli

  • Go to the SEQCART
  • Below "Retrieve an existing file to current session", choose the sequence ID file to upload, check "Genbank ACCNO", click on "Retrieve" button. These sequences will be automatically selected in Hierarchy Browser.
Construct phylogenetic tree

Before constructing the tree make sure you have all the sequences you need. By this time you should have in your selection 12 to 14 sequences: your own sequence, three relatives (type, isolated, isolated or uncultured), and ten references including the outgroups. You can have less than three relatives if the closest sequence is also the closest type strain and/or the closest isolated relative.

The references and outgroups include both Archaea and Bacteria sequences aligned using different secondary structure models. To make a phylogenetic tree you must choose the alignment model that corresponds to your unknown sequence.

  • Go to Tree Builder.
  • Under alignment model select the bacterial or archaeal model.
  • Select [S001020555] as the outgroup if your sequence was a bacteria, or [S001020552] if your sequence was an archaea.
  • Press the "create tree" button.
  • Now the phylogenetic tree is shown. The numbers in each branching point are the bootstrap values.
  • Save the tree by clicking first in the box where the tree is displayed and then press CTRL+P. The saved tree is in postscript format. For Mac users, clicking on the file will convert it automatically to a PDF file. PC users need to use the link provided in the webpage to convert the file to a PDF (PSDPDF).
  • Save the PDF file containing your phylogenetic tree and include it in your report.
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox