RDP Home  |  About  |  Announcements  |  Citation  |  Contacts  |  Related Sites  | Resources  |  Tutorials  |  Wiki  

How sequence is tagged as suspect


Every sequence in the RDP is put into one of two groups:

 

Starting from RDP release 10.29 (June 2012), RDP uses UCHIME with a reference database to detect chimeras Edgar et al., UCHIME improves sensitivity and speed of chimera detection, Bioinformatics doi: 10.1093/bioinformatics/btr381).

Prior to RDP release 10.29, the quality checking was accomplished using a combination of the RDP Seqmatch and Pintail ("At least one in twenty 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies," Ashelford et al.). Seqmatch is used to pull the two nearest neighbors of a given sequence from the ok quality set. We ensure these three sequences do not share a publication in common to weed out systematic errors. These three sequences are then run through Pintail, which provides a measure of whether the query sequence is an anomaly, given it's neighbors. Anomalous sequences are marked as suspect quality, otherwise the sequence is added to the ok quality set. This process is repeated for every sequence in the RDP, starting with near full length isolates, then near full length environmentals, followed by short sequences.

The details of the procedures followed are summarized in the flowchart below:

quality flow chart


The RDP re-categorizes sequences with every release. Sequences may jump from ok quality to suspect quality (or vice-versa) as more evidence is gathered in the form of new sequences.

 

Questions/comments: rdpstaff@msu.edu
Creative Commons License: Attribution-ShareAlike

 Move to Toptop topMove to Top