Seq Match FAQ

From Ribosomal Database Project Wiki
Jump to: navigation, search

Is there a Seq Match tutorial available?
Yes, there is a video tutorial here.

Is there a command line version of Seq Match?
You may download the Java client here.

What can be input into Seq Match?
Only a FASTA format in a text file may be used.

Can I upload multiple individual query sequences into Seq Match?
You can have multiple sequences in a single fasta file and submit to Seq Match. If you have multiple files, you would have to merge them into one if you do not want to go through the trouble doing them one by one.

When I enter a sequence and try to do Seq Match, plenty of matches come up, yet the similarity score says "not calculated". Why does this happen?
The pair-wise similarity scores are calculated only if the query sequences have been selected directly from RDP Hierarchy Browser and/or from the aligned sequences in myRDP by selection button for the group or the check boxes for the individual sequences, instead of uploading the files or copying and pasting. The status of selected sequences can be accessed (viewed, reset, id uploaded) through seqCart, which serves as the communication channel connecting RDP tools.

What is the difference between the terms Similarity score and S_ab value?

The Similarity score is pairwise sequence identity, which is calculated based on the pairwise alignment. The S_ab score is percentage of shared 7-mers between two sequences, which does not require the alignment for calculation.

What is the difference between Seq Match and BLAST results?
Sequence Match finds the nearest neighbors of your query sequences by picking up those database sequences that have the highest numbers of the shared 7-mers ("word") with the specific query sequence. It is not based on any alignment. There is no bit-saved score or E-value generated but a S_ab score that is the percentage of shared words between sequences compared.

Degenerate nucleotide codes (like K, S, Y) are treated as mismathces in nucleotide-alignment of BLASTN. How does RDP interpret these codes?
For RDP's Sequence Match, Classifier, and library Compare, ambiguity codes are treated as 'N' and ignored (skipped), while they are treated as they are (K, S, Y, etc) with differential probabilities attached to specific codes by RDP alignment and Probe Match.

Personal tools