RDP Home  |  About  |  Announcements  |  Citation  |  Contacts  |  Related Sites  | Resources  |  Tutorials  |  Wiki  

Ribosomal Database Project-II Release 9 Notes

RDP Release 9.59 (Release 9, update 59) consists of 489,840 aligned and annotated 16S rRNA sequences, along with seven online analysis tools. Update 59 was released on Mar 5, 2008.

Page Contents:


Changes from Previous RDP Releases


Release 9 has several important changes from RDP 8.1 and earlier.

Alignment Strategy
All sequences in Release 9 are aligned against a general Baterial rRNA alignment model using a modified version of the program RNACAD (1), a Stochastic Context Free Grammar (SCFG) based rRNA aligner that directly incorporates rRNA secondary structure information into its internal model. This aligner is trained on a set of high quality hand-aligned sequences and incorporates the conserved Bacterial secondary structure model of Gutell and co-workers (2).

All Release 9 tools use a new RDP Hierarchy which differs significantly from the Release 8.1 Hierarchy. The RDP Hierarchy is now based on the new phylogenetically consistent higher-order bacterial taxonomy proposed by Garrity et. al (3) with the addition of published informal classifications for well-defined lineage with few cultivated members, such as Acidobacteria, Verrucomicrobia and OP11 (4, 5, 6). Sequences are placed in the Hierarchy using the RDP Classifier.

Some tools also allow the user to view data in a taxonomy as classified by NCBI.

Data Updates
Release 9 will be kept up-to-date by frequently synching its sequence data with the major sequence repositories. The goal is for sequence data to be no more then several months behind GenBank in content.

Sequence IDs
All sequence identifiers have been redone starting with Release 9. The previous short identifiers have been replaced with RDP SIDs, a ten character identifier starting with an "S" and continuing with nine digits.


Release 9 Analysis Tools


All tools for Release 9 are completely new. Previous release tools could not handle the amount of data the RDP now contains. Most tools incorporate the idea of a data set. Data sets allow users to restrict the amount of data they view or search to Environmental or Isolates, Type or Non-Type, and short- or near-full-length sequences.

Hierarchy Browser
Sequences can be viewed in either the new RDP Taxonomy or NCBI's Taxonomy, searched, selected, and downloaded. In addition, the Hierarchy Browser can be used to generate a newick-formatted representation of selected sequences to make it easier to import RDP sequence data into ARB.

RDP Classifier
The RDP Classifier uses a naive Bayesian classifier to assign sequences to the RDP Taxonomy. The classifier is trained on the new phylogenetically consistent higher-order bacterial taxonomy proposed by Garrity et. al (3). The classifier takes a sequence and assigns it to the lowest taxonomic node possible within a certain degree of confidence. The RDP Classifier interface has been designed to make it easier to work with large numbers of query sequences. The RDP uses the RDP Classifier internally to classify all harvested sequences.

Library Compare
The Sample Library Comparison Tool takes two sequence libraries, classifies them with the RDP Classifier, then performs a statistical test to determine the significance of the differences between the two sequences libraries.

Sequence Match
Sequence Match has been re-implemented as a K-nearest neighbor classifier. By default it assigns a query sequence to the lowest taxonomic level which contains all K-nearest neighbors from the RDP database. The K-nearest neighbors for each query sequence can be displayed in the RDP Taxonomy, then copied to the Hierarchy Browser for download. Sequence Match has also been designed with an eye towards working with large numbers of query sequences.

Probe Match
Probe Match is a re-implementation of the RDP 8.1 Probe Match. Probe Match takes a single probe as a query and searches the RDP database for matching sequences. The E.coli target regions for the probe can be specified to mask out partial sequences that lack data in the relevant region.



1. Brown, M.P.S (2000) Small Subunit Ribosomal RNA Modeling Using Stochasitc Context-Free Grammar. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000). San Diego, California, USA, pp. 57-66.

2. Cannone, J.J., Subramanian, S., Schnare, M.N., Collet, J.R., D'Souza, L.M., Du, Y., Feng, B., Lin, N., Madabusi, L.V., Muller, K.M., Pande, N., Shang, Z., Yu, N., and Gutell, R.R. (2002) The Comparative RNA Web (CRW) Site: An Online Database of Comparative Sequence and Structure Information for Ribosomal, Intron, and other RNAs. BioMed Central Bioinformatics., 3,2. [PubMed]

3. Garrity, G.M., Lilburn, T.G., Cole, J.R., Harrison, S.H., Euzeby, J., and Tindall, B.J. (2007) The Taxonomic Outline of Bacteria and Archaea. TOBA Release 7.7, March 2007. Michigan State University Board of Trustees. [http://www.taxonomicoutline.org/]

4. Barns, S. M, E. C. Cain, L. Sommerville and C. R. Kuske. 2007. Acidobacteria phylum sequences in uranium-contaminated subsurface sediments greatly expand the known diversity within the phylum. Appl Environ Microbiol. 73(9):3113-6. [PubMed]

5. Sangwan, P, X. Chen, P. Hugenholtz and P. H. Janssen. 2004. Chthoniobacter flavus gen. nov., sp. nov., the first pure-culture representative of subdivision two, Spartobacteria classis nov., of the phylum Verrucomicrobia. Appl Environ Microbiol. 70(10):5875-81. [PubMed]

6. Harris, J. K, S. T. Kelley and N. R. Pace. 2004. New Perspective on Uncultured Bacterial Phylogenetic Division OP11. Appl Environ Microbiol. 70(2):845-9. [PubMed]



How often will the Release 9 be updated?

The RDP's goal is to update Release 9's dataset on a monthly basis. Each update consists of new 16S rRNA sequences that have appeared in GenBank in the previous month, as well as modifications to existing sequences. Sequences appear in the RDP Preview Release approximately one month after they are released in GenBank. Sometimes various confounding factors may slow this release cycle.

Can I download the entire alignment?

Yes, the entire alignment is available in GenBank and FASTA format here. The Hierarchy Browser can also be used to download the entire alignment or a subset of it.

Why are some bases in the alignment data printed in lowercase?

Each sequence is aligned to an alignment model constructed from a collection of representative sequences and the secondary structure of the 16S rRNA gene. Base positions that cannot be matched to positions on the model (model inserts) are represented in lowercase letters to make them easily recognizable. No further attempt is made to align the inserts. When doing treeing or other processing that relies on aligned data, it may be best to mask out regions containing these inserts.

How do I convert between Preview Release Sequence IDs ("S000380829") and Release 8.1 Sequence IDs ("E.coli")?

All Release 9 sequence records and most RDP 8.1 sequence records have a corresponding GenBank Accession. For the vast majority of sequences, the sequence IDs can be converted by matching the GenBank Accessions.

Example: Release 8.1 SID "E.coli" <------> GenBank Accession "J01695" <------> Release 9 SID "S000380829".

There are some exceptions to this rule. Some GenBank Accessions contain multiple 16S rRNA records, which means a single Accession can map to several Sequence IDs. Also, accessions are sometimes updated, retired, replaced or removed, which would make the Release 8.1 Accession invalid.


Update History


Update 59: 8,190 new sequences added (489,840 total). This release includes a beta version of the new Taxomatic visualization tool.


Update 58: 9,858 new sequences added (481,650 total).


Update 57: 20,247 new sequences added (471,792 total). Release includes a new Genome Browser for viewing sequences from whole genome project.


Update 56: 10,654 new sequences added (451,545 total).


Update 55: 22,979 new sequences added (440,891 total). Weekly RDP survey is out.


Update 54: 8,220 new sequences added (417,912 total).


Update 53: 18,719 new sequences added (409,692 total). The hierarchy model used by RDP Classifier has been updated to TOBA Release 7.7 (The Taxonomic Outline of Bacteria and Archaea).


Update 52: 10,453 new sequences added (390,973 total).


Update 51: 12,114 new sequences added (380,520 total). It's now possible to find sequences from completed genomes by searching with the GenBank Project ID. For example, to find Aquifex aeolicus VF5 (Project ID 215), use the search term "PROID:215" in the Hierarchy Browser or directly link to url: http://rdp.cme.msu.edu/search?searchStr=PROID:215.


Update 50: 16,250 new sequences added (368,406 total). Seqmatch results can be downloaded as a tab-delimited text file and imported into a spreadsheet program such as Excel for additional analysis.


Update 49: 15,966 new sequences added (351,796 total). Upgraded Tree Builder now includes bootstrap confidence estimates.


New video tutorials available in Flash, Quicktime and Windows Media formats.


Update 48: 27,932 new sequences added (335,830 total).


Update 47: 5,561 new sequences added (307,898 total).


Update 46: 16,080 new sequences added (302,337 total).


Update 45: 12,957 new sequences added (286,257 total).


Update 44: 4,134 new sequences added (273,300 total).


Update 43: 7,136 new sequences added (269,166 total). Selection on Seqmatch has been enhanced.


Update 42: 8,217 new sequences added (262,030 total). Introduces new analysis tool Tree Builder and video tutorials. Several RDP analysis services have been modified to provide extra features, including Seqmatch, Classifier and Hierarchy Browser.


Introduce simple chromatograms upload and research buddies to myRDP. Quality filter can be applied to view only good quality public sequences.


Update 41: 9,904 new sequences added (253,813 total).


Update 40: 17,750 new sequences added (243,909 total). Login for myRDP and the rRNA Pipeline has been unified. MyRDP has been updated with the ability to update project notes and sequences uploaded to myRDP can be viewed in the Classifier.


Update 39: 15,183 new sequences added (226,159 total). Introduced a new feature myRDP, incorporating RDP pipeline.


Update 38: 917 new sequences added (210,976 total). The hierarchy model used by RDP Classifier has been updated to release 6.0 of Bergey's Manual of Systematic Bacteriology. The newly trained Classifier allows classification of both bacterial and archaeal 16S rRNA sequences.


Update 37: 4,894 new sequences added (210,059 total).


Update 36: 7,332 new sequences added (205,165 total). RNA distance matrix download and ARB import help available.


Update 35: 3,137 new sequences added (197,833 total).


Update 34: 3,911 new sequences added (194,696 total).


Update 33: 5,795 new sequences added (190,785 total).


Update 32: 7,110 new sequences added (184,990 total).


Update 31: 6,627 new sequences added (177,880 total).


Update 30: 10,967 new sequences added (171,253 total).


Update 29: 4,578 new sequences added (160,286 total).


Update 28: 19,353 new sequences added (155,708 total). Release of the Library Compare analysis tool.


Update 27: 7,979 new sequences added (136,355 total).


Update 26: 4,211 new sequences added (128,376 total).


Update 25: 4,354 new sequences added (124,165 total).


Update 24: 5,580 new sequences added (119,811 total).


Update 23: 5,450 new sequences added (114,231 total).


Update 22: 7,149 new sequences added (108,781 total).


The Preview Release has been made the official RDP-II Release 9.


4,504 new sequences added (101,632 total). Release of the completely new Sequence Match and Probe Match. Updated Hierarchy Browser provides the ability to download a newick-formatted tree of selected sequences. Added a status page to RDP Classifier to prevent browser timeouts during classification of large numbers of sequences.


First public release of the RDP Classifier. Release of the completely new Hierarchy Browser.


1,231 new sequences added (97,128 total). Sequences in the RDP Hierarchy have been reclassified using vetted sequences from an upcoming 2004 release of the Taxonomic Outline of the Prokaryotes from Bergey's Manual of Systematic Bacteriology.


3,084 new sequences added (95,897 total).


4,953 new sequences added (92,813 total). The Sequence Match dataset was not updated. See this for more info.


1,791 new sequences added (87,860 total).


2,109 new sequences added (86,069 total).


4,057 new sequences added (83,960 total). Sequences in the RDP Hierarchy have been reclassified using release 4.0 of the Taxonomic Outline of the Prokaryotes from Bergey's Manual of Systematic Bacteriology.


1,688 new sequences added (79,903 total).


1,849 new sequences added (78,215 total).


2,256 new sequences added (76,366 total).


1,484 new sequences added (74,110 total).


1,535 new sequences added (72,626 total).


1,888 new sequences added (71,091 total).


3,331 new sequences added (69,203 total). 51 sequences identified as chimeric by Hugenholtz and Huber (84KB pdf article) have been moved into the hierarchy node labeled Putative Chimera (level 1.31 in the RDP Hierarchy, level 1.3.3 on the RDP's version of NCBI's Hierarchy).


1,175 new sequences added (65,872 total). The preview alignment is now available as a separate download.


1,935 new sequences added (64,697 total). Sequence Descriptions have been changed to match names given in DSMZ's Bacterial Nomenclature Up-to-date where available.


2,488 new sequences added (62,762 total). Hierarchy Browser Preview now displays the rank name next to the node name (eg. "Proteobacteria (phylum)", "Methylococcales (order)").


2,501 new sequences added (60,274 total). RDP Hierarchy changed to reflect an updated Bergey's Taxonomy.


5,888 new sequences added (57,773 total).


1,830 new sequences added (51,885 total). Hierarchy Browser Preview search capabilities updated. Cosmetic updates for all tools.


Initial release with 50,055 sequences and two analysis tools.


Questions/comments: rdpstaff@msu.edu
Creative Commons License: Attribution-ShareAlike

 Move to Toptop topMove to Top