Hierarchy Browser - Help
:: Taxonomy Choices
Select the taxonomy in which sequences will be displayed. This option is available only on the start page.
- The nomenclatural taxonomy displays sequences in a hierarchy based on the
new phylogenetically consistent higher-order bacterial taxonomy,
using a naïve bayesian classifier trained on sequences from known type strains to
assign sequences.
- NCBI displays sequences as classified in the NCBI taxonomy. This information is directly obtained from the sequence GenBank record.
:: Data Set Options
Selecting different data sets alters the types of sequences available for browsing. You may select one or both options to restrict the kinds of sequences displayed.
- Strain: Selecting Type restricts the
display to only sequences of known type strains.
- Source: Selecting Uncultured restricts the display to only sequences of
environmental samples. Selecting Isolates restricts the display to only sequences of
isolates.
- Size: Selecting >1200 Bases restricts
the display to only near-full-length sequences.
- Quality: Selecting Good restricts the display to only good quality sequences. Sequences were flagged (*) as suspect quality using Pintail. [more quality detail]
:: Browse
The lineage will display the ancestors of the current root taxon, starting from the highest to the lowest rank. Each taxon is followed by a short description of the status, including the number of selected sequences, total number of sequences assigned to that taxon and the number of search hits under that taxon.
The hierarchy view displays all the taxon nodes with sequences assigned to them in the hierarchical order, starting from a root taxon. Each line contains the taxon rank, name and the short description of status. The top taxon is the current root taxon.
Clicking any other taxon node will make that node display as the root
and will update the hierarchy view. Toggle the button before
a taxon to open
or
close
the
sequences directly assigned to that taxon.
To change to a different data set, see “Data Set Options”. After making the appropriate selections, click the Refresh button to update the view.
If you want to switch to a different taxonomy, see “Start Over” below.
:: Display Depth
Controls the number of ranks displayed in the hierarchy. With the default "Auto" setting, the program automatically adjusts the depth to display a reasonable number of lines in the browser. Increase the depth to see more ranks at the same time.
:: Search
The Search feature allows you to enter a keyword or keywords for matching against the RDP sequence identifiers and full sequence descriptions.
- Searches are case-insensitive.
- Boolean operators must be ALL UPPERCASE.
- The * or ? symbol cannot be used as the first character of a search.
- Only matches in the current options will be returned.
- The taxa with non-zero hits and match sequences will be highlighted.
- Search queries do not match taxon names or taxon ranks.
- The default search mode displays taxa with non-zero hits and match sequences. Clicking "Show both hits and non-hits" displays all the taxa and sequences in the current options. Clicking "Show hits only" will return to the default search mode.
Our search uses the Lucene search engine developed by the Apache Jakarta project.
- Fields:
We support the following search fields:
Name Description ANN the complete RDP annotation (the default field) ID RDP sequence identifiers ACC primary genbank accno DT date of last modification DEF definition SRC source ORG organism REF references FTR features PROID GenBank Project ID, from GenBank
When performing a search you can either specify a field, or use the default field. You can search any field by typing the field name followed by a colon ":" and then the term you are looking for.
For example: if you want to find documents for bacteria bacillus whose reference(s) is published by FEMS Microbiol. Lett, you can enter: DEF:bacillus REF:"FEMS Microbiol. Lett". - Boolean operators:
Boolean operators allow terms to be combined through logic operators.
The AND operator is the default conjunction operator.
(Note: Boolean operators must be ALL UPPERCASE).
- AND To search for documents that contain "Bacillus" and "clausii" use the query: Bacillus clausii.
- OR To search for documents that contain either "Bacillus" or just "clausii" use the query: Bacillus OR clausii.
- NOT To search for documents that conta in "Bacillus" but not "clausii" use the query: Bacillus NOT clausii.
- Grouping:
Use parentheses to group search words to form sub queries.
To search for either "Bacillus acidovorans" or "Comamonas acidovorans" use the query:
(Comamonas OR Bacillus) AND acidovorans.
- Wildcard Search:
To perform a single character wildcard search use the "?" symbol.
To perform a multiple character wildcard search use the "*" symbol.
(Note:* or ? symbol can not be used as the first character of a search).
- Range Search:
Range Queries allow one to match documents whose field(s) values are between the lower
and upper bound specified by the Range Query. Range Queries can be inclusive or exclusive of
the upper and lower bounds. Inclusive range queries are denoted by square brackets([]).
Exclusive range queries are denoted by curly brackets({}). To find the records with accno numbers between
AF068801 and AF068807 use the range search: ACC:[AF068801 TO AF068807].
Searches with different fields can be combined in one query such as ACC:[DQ179017 TO DQ179020] OR ID:[S000481300 TO S000481500].
(Note: "TO" must to be UPPERCASE).
- Fuzzy Search:
To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term.
For example to search for a term similar in spelling to "cillus" use the fuzzy search:
cillus~.
This search will find terms like bacillus, bellus and Acidithiobacillus.
Note: '~' will slow the search speed.
- Proximity Search:
To do a proximity search use the tilde, "~", symbol at the end of a Phrase.
For example to search for a "Escherichia" and "Acromyrmex" within 3 words of each other
in a document use the search: "Escherichia Acromyrmex"~3.
- Filtering Common Characters:
Some common characters are removed from the search. Following is the list of words currently filtered out:
"locus", "rrna", "bp", "definition", "source", "organism",
"reference", "authors", "title", "journal", "features","location", "qualifiers",
"comment", "genbank", "entry" and "bacteria".
Want to narrow or broaden searches? Go back to "Data Set Options" and select a different data set.
:: Selecting Sequences
To select or deselect all sequences below a taxon, click
or
in front
of that taxon. You can also select or deselect one sequence by checking
or unchecking
the checkbox before a sequence. This icon
indicates a subset of the sequences selected below a taxon. Click
once to select all sequences. The number of sequences you have selected
under the current data set will be displayed in the status description
of the taxons. As you browse, the total number of selections in all data
sets is displayed at the top of the page. You can download the selections for local use (see
"Download" below).
:: Viewing Sequences
Individual sequences can be viewed by clicking on the RDP sequence identifier link (i.e., S000002414).
:: Download
The Download page allows you to make choices about the formats of your data and data sets.
Download Formats: Currently three file formats are available for downloading your sequences — GenBank, Fasta and Phylip. The phylip format does not allow "Remove all gaps" and is limited to 2000 sequences.
Alignment Gaps:
- Remove common
gaps — gaps common to all of the sequences in the selected set are removed.
[Note: If you select Remove common gaps for a different group of sequences, a different set of alignment gaps will be removed.] - Preserve all gaps — no gaps are removed. Selecting this will allow the alignment to be
maintained between sets of sequences downloaded at different times.
[WARNING: With this option, the alignment will be >25,000 bases in length.] - Remove all gaps — all gaps are removed and the alignment is lost.
Navigation Tree: Download Navigation Tree in newick tree format. It can be imported into ARB. See ARB import help for more details
RNA Distance Matrix: The output format of the matrix is DNADist. If two sequences do not overlap, the output of the distance between these two sequences will be "?". Selecting "Display RDP ID" will display RDP SID (i.e., S000380829) in the output file. Selecting "Display Genbank ACCN" will display Genbank accession and the sequence region (i.e., J01695.1:1518..3059) in the output file.
- Uncorrected — uses uncorrected distance correction.
- Jukes Cantor — uses Jukes Cantor distance correction.
Data Set Options: Allow you to narrow the dataset from which you want to download. See “Data Set Options” on the top of this help page. Click the "Refresh" button to update the dataset options.
After choosing the appropriate settings, click on the "Download" button to retrieve your sequences.
:: Saving/Retrieving Seq Cart
Clicking the seqCART link on the top right of the page will open the Sequence Cart page. Clicking the "Save Cart" button will save the RDP sequence identifier and short description of all the selected sequences (from both the public RDP release and your aligned private myRDP sequences) to a file.
The saved sequences can be retrieved later by using the retrieve function below. If the uploaded cart contains myRDP sequence identifiers, only myRDP sequences for which the researcher has valid access rights, either directly or as a Research Buddy, will be retrieved. You can also upload a file with a list of GenBank accession numbers. Select the correct file content type before you click the "Retrieve Cart" button. If any IDs from the retrieved file cannot be matched to sequences in the current RDP release or your myRDP account, clicking the "View problem IDs" button under the search box in the browser result window will pop-up a new window with a list of the unknown IDs.
The sequence cart can be reset by clicking the "Reset Cart" button.
:: Start Over
Clicking on the start over link on the right top of the page will take you back to the start page.
Questions/comments: rdpstaff@msu.edu
