Uploading and Aligning Sequences
To align your 16S rRNA sequences, select the "upload" page once you are logged in to myRDP. Choose the appropriate aligner for your sequences. If your dataset contains both bacterial and archaeal sequences, follow these instructions to separate the sequences before you continue. Enter a group name that will be used to refer to your uploaded sequences collectively. You may also enter project and note information to help you manage your data, but these are not required. Finally, choose a FASTA file containing your unaligned sequences, and press UPLOAD.
Your sequences will first be checked for sufficient length (200 base pairs or longer), valid IUPAC DNA or RNA nucleotide codes, and the lack of large stretches of ambiguity characters. If any problems are found, you will have to edit your FASTA file and start this process again.
After being uploaded, you can continue to use the RDP as normal while your sequences are processed in the background. Your new sequences will be listed as "pending" on the myRDP overview page until the processing is complete. Processing can take anywhere from several seconds to several days or more, depending on 1) The number of sequences you submitted, 2) the number of other users using the RDP, and 3) internal RDP processes that are making use of our computing resources. If processing lasts more than a day or two, you may wish to email us to ensure things are working properly. Please include the email address of your myRDP account, and the group name of the pending sequences.
The myRDP overview page will list the number of sequences that have successfully aligned and allow you to select them. Once you have selected sequences, you can then click on the myRDP download page and download the sequences in a variety of aligned or unaligned formats or as a distance matrix. You can also download a taxonomic tree (of interest to ARB users). If you wish to include RDP released sequences with your personal data, go to the RDP Hierarchy Browser and select the desired sequences, then return to the myRDP download page.
How to align a dataset with mixed bacterial-archael sequences
Upload the sequence file to RDP Classifier. On the hierarchy view page, click the link "show assignment detail" next to the one of the domains, say Bacteria. Then on the assignment detail page, click the button "download as text file" to download the assignment detail file for the sequences assigned to domain Bacteria. Next use the Fasta Sequence Selection tool on the Pyrosequencing Pipeline site to select only bacterial sequences from the original sequence file into a new file. Repeat these steps for domain Archaea and obtain another file with archael sequences only. Upload the two new files to myRDP following the instructions in Uploading and Aligning Sequences.
About the RDP aligner:
Sequences uploaded to myRDP are aligned using the same aligner program used to align all RDP10 released sequences. These new alignments are created using the Infernal secondary structure based aligner, the same aligner used to provide alignments in the Rfam database of untranslated RNA molecules. We trained the aligner on a small, hand-curated set of high-quality full-length rRNA sequences derived mainly from genome sequencing projects. Please see the RDP10 Release Notes for more information on the alignment.
Why some sequences fail to align:
There are a large number of reasons a sequence may fail to align, but the most frequent cause is the sequence is not 16S rRNA but some other type of gene. Also, depending on the region of the 16S rRNA gene sequenced, there may not be enough information for the aligner to use. Smaller sequences are more likely to fail because of lack of information.
Who can see my data?
Data uploaded to myRDP remains private. It is only available to the account that submitted it, or to other RDP user accounts (Buddies, see below) that have been granted access by you, the submitting user.
Will my data appear in GenBank, EMBL, and/or DDBJ?
The RDP does not submit data to the public sequence repositories. If you wish to obtain accession numbers for your data and have it available on GenBank and other databases, you will have to do this yourself. Please visit the GenBank, EMBL, or DDBJ websites for help with this.
myRDP Download Page
The myRDP download page allows you to download your data, RDP data, or some combination thereof. To select your own data, go to the myRDP overview page. To select RDP sequences, go to the RDP Hierarchy Browser. Click or to add sequences for download. Click to remove sequences from download. All options on the download page will be unavailable until you have selected sequences. Select RDP data in the same way you've always done (for example, through the Hierarchy Browser).
The download page displays the number of RDP sequences selected, the number of selected user sequences, and the number of selected user sequences that were aligned successfully. Sequences that have not finished or failed to align will automatically be removed from aligned file or distance matrix downloads. Both unaligned and aligned sequences will be included in unaligned file downloads.
There are 3 types of downloads available from the myRDP Download page:
- Flat file
- Download an aligned or unaligned FASTA or Phylip formatted file containing your selections. An alignment mask pseudo-sequence that can be used to ‘mask out’ unalignable variable regions from further analysis, and a connonical base-pair structure pseudo-sequence are available for inclusion with alignment downloads if "show mask/structure" checked.
- Navigation Tree
- Download a Newick-formatted tree that can be used with ARB. This tree is generated by the RDP Classifier.
- Distance Matrix
- Download an uncorrected or Jukes-Cantor corrected distance matrix. This option also provides the ability to specify whether the sequence IDs are printed using the RDP Sequence ID or your sequence ID or accession numbers. The distance matrix will be formatted similar to the output of DNADist from the Phylip package, and should work in most programs that require dnadist-formatted matrices.
The Phylip-formatted, Navigation Tree, and Distance Matrix downloads are limited to 2000 sequences to reduce the stress on our servers. If you need to download more than 2000 sequences, please do not hesitate to contact firstname.lastname@example.org.
The overview page shows your uploaded data and its status. The groups of sequences you have uploaded are displayed with the most recent uploads shown first.
Each group has the following bits of information associated with it:
|Submitter ID||The email of the account that initially uploaded the data.|
|Date||Date the data was uploaded|
|Project||Project name that was given at upload time, if any|
|Initializing||If the group says "Initializing", the sequences are still being pre-processed before they are submitted to the RDP alignment program; this will disappear once this the initialization steps are completed|
|Total||# of sequences uploaded to this group|
|Pending||# of sequences awaiting alignment|
|A||# of successfully aligned sequences|
|F||# of sequences that failed to align (see Why some sequences fail to align)|
|U||# of sequences that are unaligned (not submitted to the aligner)|
Groups can be selected or deselected based upon using the icon next to their name. Once selected, sequences can be download on the myRDP download page.
Clicking on the group's name will take you to a list of sequences in that group. The sequence list contains alignment status for each sequence, its name and description. Clicking on a sequence ID in the sequence list will display the unaligned sequence data, and the classification result of the sequence as determined by the RDP Classifier. You can also select or deselect one sequence by checking or unchecking the checkbox before a sequence. Click "UPDATE SELECTION" button to save the selections.
Clicking on a group's name from the overview page will take you to a list of sequences in that group and the information about the group. Some of the information, such as project and note can be updated from the Edit Group Info page by clicking "EDIT GROUP". Click the button "VIEW CLASSIFICATION" will display the sequences in taxonomic hierarchy based on the classification results from RDP Classifier.
This page updates your account info and super buddies.
This page allows you to change your password.
You can grant access to your data to other myRDP users. When you grant access to other users, your data will appear on their myRDP overview page, exactly like their own data, only with your account name listed as the submitter id and their data will appear in your overview page. There is a button to show or hide buddies' data. The data can be used by them like any other myRDP data; i.e., for downloads or further analyses. A buddy can NOT modify your sequence data, but he/she can update the project name and add notes of a sequence group.
There are two ways to grant access to your data to other users:
- Create a super buddy
- Give a user access to all your data like having a "Master key" to all the locks in a building. After logging in to myRDP, select "Account Info" from the top menu. Underneath your account information you can enter the account name of someone to whom you wish to grant access. If you wish to revoke someone's access, simply enter their account name a second time to remove it from the list. All data you upload to myRDP will be available to the users on your super buddies list.
- Create a group buddy
- Grant a user access to data on an individual group basis like having a "Special key" to one single lock in a building. If you only want to share a particular group of data with another myRDP user, click on a group name available on the myRDP Overview page. Next, click on the EDIT GROUP/BUDDY button. Add a user by entering their account name into the box and click ADD/REMOVE BUDDY. Remove a user from the buddy list for this group by entering their account name a second time. You can only grant access to data you submitted. A group buddy does not have to be a member of your super buddy list.
You can save, retrieve aligned sequences on the Sequence Cart page from hierarchy browser. See more help from hierarchy browser.
>mySeqID Followed by an optional description on the remainder of the first line GGAATCAGCCTCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGAGGCAAGCGTTATCCGGAATTATTGGGCGTA AAGCGTCCGCAGGGGGCGCATCAAGTCTGCTGTCAAAGGTCGGAGCTTAACTCCGGTGTGGCAGTGGAAACGGGTGCGCT AGAGGGCGACAGGGGTAGAGGGAATTCCCAGTGTAGCGGTGAAATGCGTAGAGATTGGGAAGAACACCGGTGGCGAAAGC GCTCTACTGGGTCGCACCTGACCCTCAGGGACGAAAGCTAAGGTAGCGAAAGGGATTAGATACCCCTGTAGTCTTAGCCG TAAACGATGGATACTAGGCGTGGTTTGTATCGACCCAAGCCGTGCCGAAGC