16S Supervised Workflow

Quick Overview
Initial processing -> Chimera removal -> Taxonomic Classification -> Analysis

Step 1: Initial Processing

Initial processing tool main page

Initial processing tutorial

Supervised processing and analysis of 16S rRNA data starts with the initial processing tool. You may follow this tutorial using your own sequences or you may download and use the input files provided in the Pipeline Initial Processing tutorial. Prepare your input including a sequence file in FASTA, FASTQ or SFF format, a primer sequence and a tag file. Then follow the Pipeline Initial Processing tutorial to begin processing your data.


Step 2: Chimera removal

The sequence files produced by the initial processing tool need to be checked for chimeras. For this task we recommend using one of two software packages freely available online.

Option 1:

First and easiest to use is the web-based version of the Decipher chimera detection tool at http://decipher.cee.wisc.edu/FindChimeras.html. If your sequence file is under 10Mb you may submit the file to the Decipher web tool making sure to check the “Short-length sequences” option. The results will be emailed to you usually within a couple of hours.

Unfortunately, the Decipher web tool is available only for sequence files under 10Mb. So if your sequence files for individual samples from initial processing turn out to be larger than 10Mb and you still want to run Decipher, you will have to submit the file in pieces over multiple jobs or download Decipher to install and run locally. RDP’s internal tests have shown that on a lone workstation the command line tool of Decipher can be slow for large jobs, and so in this case we recommend using the UCHIME software package.

Option 2:

UCHIME is a faster, more accurate alternative to Decipher for those who are comfortable with a command line interface and compiling software from source code. RDP's testing of chimera checking tools suggests that UCHIME has a higher sensitivity to chimeras and lower false positive rate when compared to Decipher. UCHIME can be obtained from http://drive5.com/uchime/ as source code or a precompiled Linux binary. For tips on usage refer to http://drive5.com/uchime/uchime_quickref.pdf and http://drive5.com/uchime/practical_uchime.pdf.

After running Decipher or UCHIME, you will need to generate a text file that lists the IDs of chimeric sequences one per line. Decipher’s email output will provide this list for you. Simply save it in a text file. UCHIME will give you a detailed output file with sequence IDs in the second column and chimeric status in the last column. If you are using a command line interface you can use the command:

egrep '\?$|Y$' Native_1_4_A.uchime.txt | cut -f2 > ids.txt

to create a text file listing of IDs. Once you have generated the ID file you just need to retrieve the original FASTA file being checked for chimeras. The sequence file and ID file will be the inputs for the web tool that will produce a FASTA file containing only non-chimeric sequences. You can access the FASTA sequence selection tool. Make sure to check the ‘exclude’ sequences box to get a sequence file free of chimeras. On the other hand, if you wish to have the set of chimeric sequences, you can save another FASTA file with the 'exclude sequences' box unchecked.


Step 3: Run RDP Classifier

RDP Classifier tutorial


Step 4: Analyze results

Performing Statistical Analysis with R/Bioconductor package Phyloseq


Return to workflow: 16S (unsupervised), 16S (supervised) or functional gene (unsupervised)

 Move to Toptop topMove to Top