Browsers | Classifier | LibCompare | SeqMatch | Probe Match | Tree Builder | Pyro | Taxomatic | seqCart | AssignGen
myRDP logo and link

How sequence is tagged as suspect


Every sequence in the RDP is put into one of two groups:

 

This is accomplished using a combination of the RDP Seqmatch and Pintail ("At least one in twenty 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies," Ashelford et al.). Seqmatch is used to pull the two nearest neighbors of a given sequence from the ok quality set. We ensure these three sequences do not share a publication in common to weed out systematic errors. These three sequences are then run through Pintail, which provides a measure of whether the query sequence is an anomaly, given it's neighbors. Anomalous sequences are marked as suspect quality, otherwise the sequence is added to the ok quality set. This process is repeated for every sequence in the RDP, starting with near full length isolates, then near full length environmentals, followed by short sequences.

The details of the procedures followed are summarized in the flowchart below:

quality flow chart


The RDP re-categorizes sequences with every release. Sequences may jump from ok quality to suspect quality (or vice-versa) as more evidence is gathered in the form of new sequences.

 

Questions/comments: rdpstaff@msu.edu

 Move to Toptop topMove to Top