Seq Match Help

From Ribosomal Database Project Wiki
Jump to: navigation, search

Contents

Taxonomy Choices

Select the taxonomy in which sequences will be displayed. This option is available only on the start page.

The Nomenclatural taxonomy displays sequences in a hierarchy based on a schema closely matching that proposed in the new phylogenetically consistent higher-order bacterial taxonomy, using a naïve bayesian classifier trained on sequences from known type strains to assign sequences.

NCBI displays sequences as classified in the NCBI taxonomy. This information is directly obtained from the sequence GenBank record.

Data Set Options

Selecting different data sets alters the types of sequences available for browsing. You may select one or both options to restrict the kinds of sequences displayed.

Strain
Selecting Type restricts the display to only sequences of known type strains.
Source
Select Uncultured to restrict the display to only sequences of environmental samples. Selecting Isolates restricts the display to only sequences of isolates.
Size
Select >1200 Bases to restrict the display to only near-full-length sequences.
Quality
You can view only good quality sequences, suspect quality sequences, or both. Sequences were flagged (*) as suspect quality.

Display Depth

Display depth controls the number of ranks displayed in the hierarchy. With the default "Auto" setting, the program automatically adjusts the depth to display a reasonable number of lines in the browser. Increase the depth to see more ranks at the same time.

Summary

The summary page shows where a query sequence belongs to agreed by all the match sequences of that query.

The lineage will display the ancestors of the current root taxon, starting from the highest to the lowest rank. Each taxon is followed by the number of query sequences under that taxon.

The hierarchy view displays all the taxon nodes with sequences assigned to them in the hierarchical order, starting from a root taxon. Each line contains the taxon rank, name and the short description of status. The top taxon is the current root taxon. Clicking any other taxon node will make that node display as the root and will update the hierarchy view.

To change to a different data set, see "Data Set Options". After making the appropriate selections, click the Refresh button to update the view.

Selecting Matches

To select or deselect all sequences below a taxon, click Plus.gif or Minus.gif in front of that taxon. You can also select or deselect one sequence by checking or unchecking the checkbox before a sequence. This icon Diag.gif indicates a subset of the sequences selected below a taxon. Click Diag.gif once to select all sequences. As you browse, the total number of selections in all data sets is displayed at the top of the page. Click the "Save selection and return to summary" button to go back to summary page. The selections will also be saved in Sequence Cart. You can download the selections for local use.

Printer Friendly Results

From the summary page, click on "show printer friendly result" link next to the root taxon, all the match results under that root taxon will be displayed. The results for each query sequence are shown in hierarchical order.

Input Formats

Currently Fasta, GenBank and EMBL formats are allowed. Both uppercase and lowercase formatting is allowed. The number of query sequences is limited to 2000. If you need more than 2000 sequences classified, please contact rdpstaff@msu.edu.

Viewing Sequences

Individual sequences can be viewed by clicking on the RDP sequence identifier link (i.e., S000002414).

Starting a New Match

Clicking on new match link on the right top of each page will remove all the selections you made in the previous browsing experience and take you back to the start page.

Result Format

Each match result line contains six elements, from left to right:

1. A short ID used to uniquely identify the RDP sequence. A click will return the simple entry, including the sequence.
2 The orientation of the query sequence when the match is performed. "-" means the query sequence has been reverse-complemented. A top match hit with "-" orientation usually indicates the query sequence is a minus strand.
3. A similarity score. SeqMatch reports the percent sequence identity over all pairwise comparable positions when run with aligned myRDP sequences. (Comparable positions are aligned positions containing a base in both sequences). Note that the rank order may differ between S_ab and pairwise identity scores, but the top 20 S_ab scores will contain the closest sequence by pairwise identity about 95% of the time (Cole et al). If two sequences do not overlap, the similarity between these two sequences will be displayed as "?".
4. A seqmatch score (S_ab). These are the number of (unique) 7-base oligomers shared between your sequence and a given RDP sequence divided by the lowest number of unique oligos in either of the two sequences.
5. The number of uniquely occurring oligomers within a given sequence (Olis). If the same oligomer occurs more than once then they are counted only once; thus this number only approximately reflects the sequence length. Counting only unique oligos compensates somewhat for composition bias (for example, inserts tend to be GC-rich and it becomes very likely that the same GC-rich oligos occur several times; by counting these only once, this artifact becomes less severe).
6. Full name. The definition line from the RDP distribution, often the same as Genus/species/string name and accno.

Downloading Selected Sequences

The selection can be downloaded from the seqCART on the main tool menu.

Display KNN Matches

Display KNN matches controls the number of matches displayed per sequence, also the number used to classify queries by unanimous vote. The maximum value for k is 20.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox