Frameshift Correction and Closest
Match Assignment by RDP FrameBot

Quick Overview
  • Dereplicate sequences
  • Correct frameshift errors
  • Translate nucleotides to amino acids
  • Filter sequences based on percent identity to reference
  • Nucleotide sequences in FASTA format
  • Reference sequences for target gene (if RDP does not have a model)
  • (Sample input files)

  • Dereplicated nucleotide file with sample mapping and ids
  • Frameshift corrected protein and nucleotide sequences
  • FrameBot statistics including closest match to reference for each query sequence
  • (Sample output files)

The Process:

The FrameBot tool main page can be found at

RDP FrameBot (Wang et al., 2013. mBio 4:e00592-13; doi: 10.1128/mBio.00592-13) is a tool for correcting frameshift errors caused by insertions and deletions in DNA sequences. Given a set of known protein reference sequences for a gene, FrameBot will take in nucleotide reads and return frameshift-corrected nucleotide and protein sequences and an optimal protein pairwise alignment. FrameBot checks the query DNA sequence in both forward and reverse directions and returns the results in the forward orientation.

RDP currently maintains a set of reference sequences for about twenty genes. These may be selected from the dropdown box on the FrameBot webtool’s input page. Otherwise a file containing protein sequences for the target gene will have to be supplied by the user. FrameBot is computationally intensive because it does all-against-all comparison between query DNA and the target protein sequences, therefore we recommend limiting the number of protein target sequences to 200.

The choice of percent identity is important because FrameBot will filter out sequences based on this value. Each sequence is matched to a best match reference sequence and the percent identity is calculated from their alignment. If this value is below the threshold specified, the sequence will be filtered out. For a very stringent filter use a value around 0.80 and for more relaxed settings stick with the default 0.40. If you're following the tutorial, keep the default settings. Your screen should look like this after uploading the sample input and selecting nifH:

FrameBot webtool

When the job is done the download will automatically start if the web page stays open and an email will be sent to you with a link to the results.

Submitting your sequences to FrameBot in this way not only performs frameshift correction and translation of DNA to protein but also dereplicates the sequences. In the main output directory you will find the dereplicated sequence file and the corresponding sample and id files.

FrameBot has six output files:

The framebot.txt file contains the pairwise alignment and many important statistics for each sequence.

In the graphic above, a deletion and an insertion are highlighted and replaced in the nucleotide sequence by the number of nucleotides that were there before the correction was made. The STATS line above each alignment contains the values for percent identity, length, score and # of frameshifts.

The FrameBot nearest neighbor assignments can be used to group reads by relative abundances of the nearest matches, or view the differences in the samples using ordination analysis.

PCA Analysis Using FrameBot Nearest Neighbor Assignments:

PCA Analysis Using FrameBot Nearest Neighbor Assignments

Relative abundances of NEON reads grouped by nearest matches at the phylum and class levels:

PCA Analysis Using FrameBot Nearest Neighbor Assignments

 Move to Toptop topMove to Top