RDP MultiClassifier -- a command line tool

Quick Overview

Assign bacterial and archaeal 16S rRNA or fungal LSU sequences to the new phylogenetically consistent higher-order bacterial and fungal taxonomy

Single or multiple sequence files in FASTA, GenBank or EMBL format

(Sample input files)

Compressed results folder containing the sequence count for each taxon in the hierarchy and assignment details on a sequence by sequence basis

(Sample output files)

The Process:

The command line RDP MultiClassifier uses RDP naïve Bayesian Classifier to classify single or multiple files containing 16S rRNA and Fungal LSU genes sequences. It outputs the assignment count for each taxon in hierarchy order in different columns, one for each sample. The hierarchy assignment count output, which is similar to an OTU table from clustering results, can be easily imported into other statistical packages, such as EstimateS or R, to do sample comparison.

Download the program from SourceForge . . .
You can download the latest version rdp_multiclassifier_x.x.zip from SourceForge. There is no installation neccesary provided you have Java installed. At the time of the writing of this tutorial the latest version of MultiClassifier is 1.1.

Download the sample input files . . .
stored in the sample classifier_input zip file which contains four sequence files from the ourput of the initial processing tutorial.

The main file used in running MultiClassifier is MultiClassifier.jar located in the rdp_multiclassifier_1.1 folder downloaded from SourceForge. You may run MultiClassifier by replacing PATH in the command shown below with the path to the directory containing MultiClassifier.jar on your system. For example, /home/Downloads/rdp_multiclassifier_1.1/MultiClassifier.jar

Example commands . . .
From a terminal, run the following command to classify these sequences:

java -Xmx1g -jar /PATH/MultiClassifier.jar --conf=0.5 --hier_outfile=classification_hier.txt --assign_outfile=classification_detail.txt multiclassifier_input/*

An example command for running MultiClassifier

or list the file names to be classified:

java -Xmx1g -jar /PATH/MultiClassifier.jar --conf=0.5 --hier_outfile=classification_hier.txt --assign_outfile=classification_detail.txt multiclassifier_input/Native_1_2_A_trimmed.fasta multiclassifier_input/USGA_1_7_A_trimmed.fasta

More general usage information . . .

java -Xmx1g -jar /PATH/MultiClassifier.jar [--gene=][--train_propfile=<file>] [--assign_outfile=<file>] [--hier_outfile=<file>] [--shortseq_outfile=<file>] [--conf=<confidence_cutoff>] [--minWords=<min_words_per_bootstrap>] [--bootstrap_out=<file>] [--format=allrank,fixrank,db] sample_fasta_file[,dupCountInfile]...

At least one sample_fasta_file is required. Multiple sequence files are separated by space.


The results are . . .
two text files:

  1. Sequence by sequence assignment details which include the confidence value (0 to 1) for assignment at each level of the hierarchy
  2. Sequence count at each level of hierarchy separated by sample

The output files . . .
can be used to generate:

Return to workflow: 16S (supervised)

 Move to Toptop topMove to Top