Lib Compare Help
Upload two sequence files from the start page. Currently Fasta, GenBank and EMBL formats are allowed. Both uppercase and lowercase formatting is allowed.
The Library Compare Tool requires a sequence have at least 200 bases to get a good classification result. The number of query sequences is limited to 40000 for each library. If you need to compare libraries with more than 40000 sequences, please contact email@example.com.
View Results in Table
The comparison results are displayed in a table sorted by significance value. Each row contains the taxon rank, name, number of assignment from library 1, number of assignment from library 2, and the significance of the differences. Clicking the link on the taxon name will position that taxon as root and display the results in the hierarchy view.
On the tabular view page, click the button "download compare results as text" to download it to your local hard disk. The results can be imported into a spreadsheet program for additional analysis. One example presentation of the comparison results will be making pie charts using the percentage of assignments of all the phylum taxa.
View Results in Hierarchy
The comparison results are shown in a taxonomic hierarchy. A bar graph gives a visual representation of the distributions of the two libraries among the immediate children of current taxon root.
The lineage displays the ancestors of the current root taxon, starting from the highest to the lowest rank.
The hierarchy view displays all the taxon nodes with sequences assigned to them in the hierarchical order. Each line contains the taxon rank, name, the number of assignment from library 1, the number of assignments from library 2 and the significance of the differences. The top taxon is the current taxon root. If you click the “show assignment detail” link on that line, a detailed classification result of the sequences that are assigned to that taxon will be displayed. Clicking any other taxon node will make that node display as the root and will update the hierarchy view.
Each taxon may have an unclassified node below it.
For each rank assignment, the Classifier automatically estimates the classification reliability using bootstrapping. Ranks where sequences could not be assigned with a bootstrap confidence estimate above the threshold are displayed under an artificial 'unclassified' taxon. The assignments to an unclassified taxon are not compared. The significance value of the unclassified taxon is then displayed as 'NA'. The default threshold is 80%.
Display depth controls the number of ranks displayed in the hierarchy. With the default "Auto" setting, the program automatically adjusts the depth to display a reasonable number of lines in the browser. Increase the depth to see more ranks at the same time.
To view the detail classification results for individual queries assigned to a taxon, click the link [show details] beside the current root taxon. Note the symbol "-" after a sequence name means the results are obtained using reverse complement of that query sequence. This indicates the orientation of that query sequence is reverse.
On the assignment detail page, click the button "download assignments as text" to download it to your local hard disk. You can save the results as a semi-colon-delimited text file and import it into a spreadsheet program such as Excel.
Classification and Library Compare algorithm has been published in Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.
Our Library Compare Tool uses the RDP naïve Bayesian classifier to provide rapid classification of library sequences into the bacterial taxonomy. This classifier is trained on known type strain rRNA sequences (and a small number of other sequences representing regions of bacterial diversity with few named organisms). Each library sequence is assigned to a set of hierarchical taxa from the phylum to genus rank, along with a confidence estimate for each assignment.
The Library Compare Tool estimates the likelihood that the frequency of membership in a given taxon is the same for the two libraries using a statistical test first developed for comparing transcript levels in "digital Northern" analysis (Audic et al, The Significance of Digital Gene Expression Profiles) if the frequencies is small. The probability of the observed difference in assignment to taxon T is estimated as:
where N1 and N2 are the total number of sequences for library 1 and 2 respectively, and x and y are the number of sequences assigned to T from library 1 and 2 respectively. One underlying assumption for this equation is that x and y are small relative to N1 and N2 (less than 5% of the total), and N1 and N2 are relative large (above 500).
For larger frequencies, in our case, if x and y are >= 5, we use the standard two population proportions test assuming approximately a standard normal distribution. The p value is estimated from the z critical value.
Taxonomic Hierarchy Model
The current hierarchy model used by the naïve Bayesian rRNA classifier comes from that proposed in the new phylogenetically consistent higher-order bacterial taxonomy with some minor changes for lineage with few cultivated members. From the highest to the lowest, the major formal taxonomic ranks are: domain, phylum, class, order, family, genus and species. There are occasional intermediate ranks such as 'subclass' and 'suborder'.
For the rare event, the statistics test is good if the library size is large. In addition, the equation is correct for a single test. It might be used to test multiple taxa between libraries.