Classifier FAQ

From Ribosomal Database Project Wiki
Jump to: navigation, search

How do I cite the Classifier?
Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-7. The Classifier and MultiClassifier (to classify multiple samples) command line programs, along with the source code, javadoc, example taxonomy and sequence files, and help files, is freely available from SourceForge and is released under the terms of the GNU General Public License.

How many sequences can the Classifier use?
The Classifier is limited to 100,000 sequences on the web interactive tool. This number increases to 500,000 if you use it on RDPipeline.

What input format does Classifier use?
Classifier only accepts FASTA files.

I have over 500,000 sequences. What do I do?
You need to download the command line Classifier. This can be found here.

How do I use the command line Classifier tool?
Follow these steps:

1. Download the program to a proper folder
2. Unzip the download (7-zip:)
3. Open a terminal (used to be DOS)
4. Change the terminal to the folder that contain the program file: "multiclassifier.jar"
5. Put the sequence file to the same folder
6. Type: "java -Xmx1g -jar multiclassifier.jar" and hit "ENTER"
7. Follow the printout on the screen for the input options.

I cannot get the command line Classifier to work. What should I do?
Make sure that the latest version of Java is installed on your computer.

Does the Classifier take zipped or compressed files?
No, Classifier does not take zipped or compressed files.

Where can a training set for the RDP classifier be downloaded from?
You can find the training set here.

How do I train the Classifier on my own data?
Under the directory "sampledata' in download, there are the sample files needed for the training. The README file describes the procedure.

Is there a minimum sequence length for the Classifier?
Sequences to be classified must be at least 50 base pairs long.

Does Classifier separate out to the species level?
No, Classifier only goes as far as genus.

Classifier says that a sequence is unclassified. What does that mean?
When Classifier says a sequence is unclassified, it means that there is not enough evidence to assign this sequence to any of the exiting taxa at that level based on the type sequence info the Classifier is trained on.

How to use Classifier to add taxonomic information to cluster BIOM file generated by RDP or other tools?
It can be done by running the command-line Classifier with the combination of "-m" and "-f biom" option. Classifier does this by assigning a taxonomic lineage to each OTU representative sequence and adding this information to each OTU in the BIOM file. Please be aware: The sequence identifiers in the input FASTA file need to match the OTU IDs in the BIOM file. RDP's Representative selection tool with '-c' option generates this sequence input file. Please following the detailed directions.

"How to test if two samples have different taxonomic compositions?"
The 'libcompare' command does this function.

Personal tools