RDPipeline Classifier

Quick Overview

Assign bacterial and archaeal 16S rRNA or fungal 28S gene sequences to the new phylogenetically consistent higher-order bacterial and fungal taxonomy using the RDP Classifier

Sequences in FASTA, GenBank or EMBL format

(sample bacterial 16S input file)

Four text files:

  1. A file of average bootstrap scores for each rank (bootstrap_conf.txt)
  2. Sequence-by-sequence classification results including confidence scores at each level of the hierarchy (classifications.txt)
  3. The sequence count for each taxon in the hierarchy (hierarchy.txt)
  4. The failed sequences list (failed_sequences.txt)

(sample bacterial 16S output file)

The Process:

The classifier tool main page can be found at RDPipeline Classifier

To run the Classifier on a set of sequences, select the gene from the drop-down menu and submit FASTA, GenBank or EMBL formatted sequences by uploading a file. Input sequences should be at least 50bp for accurate results. Uppercase and lowercase formats are allowed. Adjust the confidence Cutoff length(Optional).

Download the sample input files . . .
for this tutorial -- the sample input file contains the following two files:
two FASTQ sequence file (*_tit_trimmed.fasta)

Inside the 16S_inputfiles.zip file . . . It contains two trimmed sequences FASTA file.

region 1 input file

The sequence file . . . is a larger file that contains FASTA formatted nucleotide reads:

sequence reads

Uploading your data . . .
if you uploading several input files and want the result to show on one file, please check box: Treat all inputs files as one sample. Choose "Bacterial 16S" as the gene and "allRank" as the output Format. If you have done clustering using the same sequence set, you may choose "biom" format and upload the .biom clustering results. See more help on Performing Statistical Analysis with R/Bioconductor package Phyloseq.

SCREENSHOT of input form

Output Files. . . for each tag is a directory which contains the following files (download sample output zip file):

  • A table of bootstraps scores of each rank (bootstrap_conf.txt)
  • Sequence by sequence assignment details which include the confidence value (0 to 1) for assignment at each level of the hierarchy (classifications.txt)
  • Sequence count at each level of hierarchy separated by sample (hierarchy.txt)
  • A text file that lists all failed sequences(failed_sequences.txt)

contents of output folder

The hierarchy file . . . contains the assignment counts for each sample. This can be imported into R for ordination analysis. See more help on Performing Statistical Analysis with R/Bioconductor package Phyloseq.

Hierarchy output file

Go to : RDP Classifier Tutorial Page

 Move to Toptop topMove to Top