Performing Complete Linkage Clustering
TASKS | INPUT | OUTPUT |
---|---|---|
|
Aligned sequence file(s) in FASTA format |
Dereplicated sequence file in FASTA format with id and sample file A folder named "clustering" that contains the actual *.clust output file and for each sample:
|
The Process:
The RDP mcClust complete linkage clustering tool (from the FunGene Pipeline) works for both nucleotide and protein sequences, whereas the cluster tool on RDP's Pyrosequencing Pipeline site only works for nucleotide sequences. This tutorial uses mcClust to illustrate how the cluster tools work.
This complete linkage clustering tool allows you to make a cluster file based on one or more aligned sequence files, the output from RDP Infernal Aligner or HMMER3 Aligner. The sequence file must be an aligned FASTA file. If a submission contains multiple aligned files they should be aligned to the same model. As with the RDP Aligner, multiple files may be compressed (zipped) in to a single file before submission.
If you are following the tutorial you will have four aligned files to upload. To upload these fasta files to be clustered together, first compress the files in to a single compressed file and then upload this file.
Users may choose the maximum distance (to specify 3% distance, enter 0.03) and step size (the increment between the cluster distances) for their clustering run by entering values into the boxes. For the tutorial example we use Distance Cutoff of 0.1 and Step of 0.01. Users also have the option of clustering all submitted FASTA files together or separately. Once a clustering job is finished, the compressed results file is emailed to the address provided.
A clustering job in progress.
Output files from clustering run. Clustering results graph for Native_1_2_A sample |
|