RDP Aligner

Quick Overview
TASKS  INPUT OUTPUT
  1. Align sequences to RDP’s rRNA models using Infernal aligner

Sequence file(s) in FASTA format

(Sample input files)

Compressed results folder containing:

  1. Aligned sequences in FASTA format (single or multiple files)
  2. Histogram showing start and end positions
  3. Summary statistics text file
  4. Alignment score file
  5. Infernal summary file

(Sample output files)


The Process:

The sequences output as a result of the initial processing should be aligned in preparation for clustering. The RDP Aligner tool aligns the processed and trimmed sequences using the secondary-structure aware INFERNAL aligner (Nawrocki & Eddy, 2007). Separate alignment models are provided for Bacteria, Archaea and Fungi. The sequences will be corrected to plus (+) strand orientation prior to aligning. When using the aligner tool, the maximum number of sequences is 1,000,000 and the minimum sequence length is 50 bases. Sequences should contain only IUPAC nucleotide codes and all gaps will be ignored. When aligning, you have the option of aligning multiple FASTA files as a single sample or as separate samples. Also FASTA files may be compressed before they are transferred to shorten upload times.

If your sequence reads do not cover the complete amplicon or same gene region, or the quality drops at the distal end, you need to trim the reads to cover the identical gene region before continuing downstream analysis (such as clustering). To determine the best trimming position might be difficult, feel free to contact RDP staff for help.

clustering in progress

Compress all of the FASTA files If you upload multiple files . . . one from each subdirectory (for initial processing results), into a single zip file. To do this, copy all of the *trimmed.fasta files into one folder and compress as shown on Mac here:

to "zip" files

If you upload multiple files . . . you will be given the option of combining the files into one alignment, or creating separate alignment files for each. You will be given this choice on a new page that loads after clicking the initial "Submit" button and waiting for the file upload(s) to finish. For this tutorial, all of the sequences will be aligned as one sample using the alignment model "Bacteria".

The results are emailed to the email address provided and may look like this:

results files

The results file . . . contains a FASTA file of aligned sequences, a .png format histogram displaying number of sequences vs. position (start and end positions) and a statistics file.

histogram

The summary statistics file . . . contains 6 summary lines:

Line 1: Total sequences submitted for alignment
Line 2: Alignment model used (bacteria or archaea)
Line 3: Model reference sequence (base positions listed are positions in this sequence)
Line 4: Number of sequences containing only gaps
Line 5: Average number of comparable positions
Line 6: Average sequence length (measured as the distance between the reference beginning and end position)

The remainder of the stats file lists the sequence ids and the corresponding start and stop positions on the model.

summary stats file


 Move to Toptop topMove to Top