Probe Match - Help
Probe Match allows you to search the RDP's collection of 16S rRNA sequences using a probe of your choice. This tool will not design probes for you; you will need a third party program such as PRIMROSE to do this.
Probe Match is used in 2 or 3 steps:
Probes must conform to the following specifications:
- a length of less than 64 bases.
- made up of IUPAC nucleotide codons (ATUCGMRWSYKVHDBN)
Ambiguity codons in your probe will expand to match all associated bases (ie, an 'N' in your probe will match any base). If you specify your probe as targeting the minus strand of the 16S rRNA gene, your probe will be reverse-complemented before searching. Specifing an E.coli region will restrict your search to only those sequences that have data within that region. E.coli region information is obtained from the RDP alignment. This option can be useful if you are interested in what a probe fails to hit. For example, many 'universal' probes will fail to match more than half the available sequences. This is because most of the sequences in the RDP database are partial sequences, and may or may not contain sequence data in the region the probe targeted.
Probe Match displays your probe hits in the RDP's Hierarchy.
- Hierarchy nodes that do not contain any probe hits are grayed out.
- Clicking on any node with probe hits under it will make that node the
root of your view.
- The numbers in parenthesis after a hierarchy node, domain Bacteria
(22/37456), are the number of probe hits and the total number of searched
sequences in the RDP data set.
- The Lineage list allows you to go back to a higher node.
- The number in parenthesis after a node in the Lineage list is the total
number of probe hits that occur at that node.
- [List results for this node] takes you to the List View for that node.
- Clicking the triangle next to a node will display the matched sequences
under that node:
Display Depth: Display Depth controls the number of ranks (nodes) displayed in the hierarchy. With the default 'Auto' setting, the program automatically adjusts the depth to display a reasonable number of lines in the hierarchy view. Increase the depth to see more ranks at the same time. Nodes that are grayed out because no probe hits were found in them will not be expanded.
Allowed Errors or Edit Distance: Errors are a mismatch, insert, or delete that occurs when matching a probe to a sequence. The number of errors of these sort that occur when matching a probe to a sequence is called the Edit Distance.
You can change the maximum allowed edit distance using the 'Allow Errors' menu on the results page. The maximum edit distance is 1 + 10% the length of your probe. This maximum is arbitrarily set and has no biological meaning. If there are no other choices available under the 'Allow Errors' menu, this means there were no matches with that many errors.
Data Set Options: Data Set Options alters the types of sequences for which probe hits are displayed.
- Strain: Selecting Type restricts the display to only sequences of known type strains.
- Source: Selecting Uncultured restricts the display to only sequences
of environmental samples. Selecting Isolates restricts the display to
only sequences of isolates. Environmental/Isolate information is obtained from
analyzing annotation data from Genbank.
- Size: Selecting >1200 Bases restricts the display to near-full-length sequences only.
- Quality: View only good quality sequences, suspect quality sequences, or both. Sequences were flagged (*) as suspect quality. [more quality detail]
The List View shows your probe hits in a list format.
- The first column is the RDP Sequence ID of a sequence that matched your
probe. Clicking the sequence ID will take you to its GenBank-formatted RDP
- If you have selected to allow errors, the second column is the number of mismatches, inserts, or deletes that occurred between the probe and target sequence.
- The third column contains the alignment region where the sequence hit.
Errors are highlighted in red. It is possible that a probe can match to
multiple regions in a sequence. In such cases, only the first and best
match is displayed.
- The final column is a short description of the sequence.
Clicking on any of the sequences in the Lineage list will return you to that node in the Hierarchy View.
The Download button will allow you to download a text version of what is displayed. Only sequences under the currently selected node will be downloaded.
List View Options are the same as the Hierarchy View Options.