Browsers Help

From Ribosomal Database Project Wiki
Jump to: navigation, search

Contents

Taxonomy Choices

Select the taxonomy in which sequences will be displayed. This option is available only on the start page. The nomenclatural taxonomy displays sequences in a hierarchy based on the new phylogenetically consistent higher-order bacterial taxonomy, using a naïve bayesian classifier trained on sequences from known type strains to assign sequences. NCBI displays sequences as classified in the NCBI taxonomy. This information is directly obtained from the sequence GenBank record.

Data Set Options

Selecting different data sets alters the types of sequences available for browsing. You may select one or both options to restrict the kinds of sequences displayed.

Strain
Selecting Type restricts the display to only sequences of known type strains.
Source
Selecting Uncultured restricts the display to only sequences of environmental samples. Selecting Isolates restricts the display to only sequences of isolates.
Size
Selecting >1200 Bases restricts the display to only near-full-length sequences.
Quality
Selecting Good restricts the display to only good quality sequences. Sequences were flagged (*) as suspect quality using Pintail. [more quality detail]

Browse

The lineage will display the ancestors of the current root taxon, starting from the highest to the lowest rank. Each taxon is followed by a short description of the status, including the number of selected sequences, total number of sequences assigned to that taxon and the number of search hits under that taxon.

The hierarchy view displays all the taxon nodes with sequences assigned to them in the hierarchical order, starting from a root taxon. Each line contains the taxon rank, name and the short description of status. The top taxon is the current root taxon.

Clicking any other taxon node will make that node display as the root and will update the hierarchy view. Toggle the button before a taxon to open or close the sequences directly assigned to that taxon.


To change to a different data set, see “Data Set Options”. After making the appropriate selections, click the Refresh button to update the view.
If you want to switch to a different taxonomy, see “Start Over” below.

Display Depth

Display depth ontrols the number of ranks displayed in the hierarchy. With the default "Auto" setting, the program automatically adjusts the depth to display a reasonable number of lines in the browser. Increase the depth to see more ranks at the same time.

Search

The Search feature allows you to enter a keyword or keywords for matching against the RDP sequence identifiers and full sequence descriptions.

Here are some important points:

  • Searches are case-insensitive.
  • Boolean operators must be ALL UPPERCASE.
  • The * or ? symbol cannot be used as the first character of a search.
  • Only matches in the current options will be returned.
  • The taxa with non-zero hits and match sequences will be highlighted.
  • Search queries do not match taxon names or taxon ranks.
  • The default search mode displays taxa with non-zero hits and match sequences. Clicking "Show both hits and non-hits" displays all the taxa and sequences in the current options. *Clicking "Show hits only" will return to the default search mode.
  • Our search uses the Lucene search engine developed by the Apache Jakarta project.

Search fields and their discriptions



Name Description
ANN the complete RDP annotation (the default field)
ID RDP sequence identifiers
ACC primary genbank accno
DT date of last modification
DEF definition
SRC source
ORG organism
REF references
FTR features
PROID GenBank Project ID, from GenBank

When performing a search you can either specify a field, or use the default field. You can search any field by typing the field name followed by a colon ":" and then the term you are looking for. For example, if you want to find documents for bacteria bacillus whose reference(s) is published by FEMS Microbiol. Lett, you can enter: DEF:bacillus REF:"FEMS Microbiol. Lett".


Boolean Operators

Boolean operators allow terms to be combined through logic operators. The AND operator is the default conjunction operator. (Note: Boolean operators must be ALL UPPERCASE).
Use AND To search for documents that contain "Bacillus" and "clausii" use the query: Bacillus clausii.
Use OR To search for documents that contain either "Bacillus" or just "clausii" use the query: Bacillus OR clausii.
Use NOT To search for documents that conta in "Bacillus" but not "clausii" use the query: Bacillus NOT clausii.


Grouping

Use parentheses to group search words to form sub queries. To search for either "Bacillus acidovorans" or "Comamonas acidovorans" use the query: (Comamonas OR Bacillus) AND acidovorans.


Wildcard Search

To perform a single character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*" symbol. (Note:* or ? symbol can not be used as the first character of a search).

Range Search

Range Queries allow one to match documents whose field(s) values are between the lower and upper bound specified by the Range Query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Inclusive range queries are denoted by square brackets([]). Exclusive range queries are denoted by curly brackets({}). To find the records with accno numbers between AF068801 and AF068807 use the range search: ACC:[AF068801 TO AF068807]. Searches with different fields can be combined in one query such as ACC:[DQ179017 TO DQ179020] OR ID:[S000481300 TO S000481500]. (Note: "TO" must to be UPPERCASE).

Fuzzy Search

To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "cillus" use the fuzzy search: cillus~. This search will find terms like bacillus, bellus and Acidithiobacillus. Note: '~' will slow the search speed.

Proximity Search

To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "Escherichia" and "Acromyrmex" within 3 words of each other in a document use the search: "Escherichia Acromyrmex"~3.

Filtering Common Characters

Some common characters are removed from the search. Following is the list of words currently filtered out: "locus", "rrna", "bp", "definition", "source", "organism", "reference", "authors", "title", "journal", "features","location", "qualifiers", "comment", "genbank", "entry" and "bacteria".

Narrow or Broaden Search

Go back to "Data Set Options" and select a different data set.

Selecting Sequences

To select or deselect all sequences below a taxon, click or in front of that taxon. You can also select or deselect one sequence by checking or unchecking the checkbox before a sequence. This icon indicates a subset of the sequences selected below a taxon. Click once to select all sequences. The number of sequences you have selected under the current data set will be displayed in the status description of the taxons. As you browse, the total number of selections in all data sets is displayed at the top of the page. You can download the selections for local use (see "Download" below).

Viewing Sequences

Individual sequences can be viewed by clicking on the RDP sequence identifier link (i.e., S000002414).

Download

The Download page allows you to make choices about the formats of your data and data sets.

Download Formats

Currently three file formats are available for downloading your sequences — GenBank, Fasta and Phylip. The phylip format does not allow "Remove all gaps" and is limited to 2000 sequences.

Alignment Gaps

Remove common gaps
Gaps common to all of the sequences in the selected set are removed. [Note: If you select Remove common gaps for a different group of sequences, a different set of alignment gaps will be removed.]
Preserve all gaps
No gaps are removed. Selecting this will allow the alignment to be maintained between sets of sequences downloaded at different times. [WARNING: With this option, the alignment will be >25,000 bases in length.]
Remove all gaps
All gaps are removed and the alignment is lost.

Navigation Tree

Download Navigation Tree in newick tree format. It can be imported into ARB. See ARB import help for more details

RNA Distance Matrix

The output format of the matrix is DNADist. If two sequences do not overlap, the output of the distance between these two sequences will be "?". Selecting "Display RDP ID" will display RDP (i.e., S000380829) in the output file. Selecting "Display Genbank ACCN" will display Genbank accession and the sequence region (i.e., J01695.1:1518..3059) in the output file.

Uncorrected
Uses uncorrected distance correction.
Jukes Cantor
Uses Jukes Cantor distance correction.

Saving/Retrieving SeqCart

Clicking the seqCART link on the top right of the page will open the Sequence Cart page. Clicking the "Save Cart" button will save the RDP sequence identifier and short description of all the selected sequences (from both the public RDP release and your aligned private myRDP sequences) to a file.

The saved sequences can be retrieved by using the retrieve function below. If the uploaded cart contains myRDP sequence identifiers, only myRDP sequences for which the researcher has valid access rights, either directly or as a Research Buddy, will be retrieved. You can also upload a file with a list of GenBank accession numbers. Select the correct file content type before you click the "Retrieve Cart" button. If any IDs from the retrieved file cannot be matched to sequences in the current RDP release or your myRDP account, clicking the "View problem IDs" button under the search box in the browser result window will pop-up a new window with a list of the unknown IDs.

The sequence cart can be reset by clicking the "Reset Cart" button.

Starting Over

Clicking on the start over link on the right top of the page will take you back to the start page.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox