Using the .clust Results File

The Process:

The *.clust file may be submitted to other RDP tools to calculate the Shannon and Chao1 Index, Jaccard and/or Sørensen Indicies, Rarefaction, or to convert to an R-compatible community data matrix. Be aware: when using any of these downstream tools, please make sure that distance cutoffs to be specified are already included in the upload .clust file.

Performing Statistical Analysis with R/Bioconductor package Phyloseq . . . Convert .clust file to biom format. Add classification and sample data to make a rich-dense biom file.

Generate OTU table (R-compatible community data matrix). . . may be generated from a complete linkage cluster file. The only input required is a *.clust file and you will need to choose the upper and lower distance cutoff values (valid range is 0.0 to 0.5). The word "aligned_" or "_trimmed" will be removed if present in the sample name.

R format clustering data

The Rarefaction tool . . . requires only a *.clust file as input and an email address to send the results. Results from rarefaction include a text tile containing the rarefaction data and a graph with some but not all of the rarefaction data represented.

The first row in the text file contains column headers corresponding to each distance in the cluster file. The column headers marked with "U" and "L" indicate the upper and lower 95% confidence limit for the data at that distance. Open the text file in Excel to graph the data.

Rarefaction spreadsheet

Sample Abundance Statistics (Jaccard and Sørensen Index) . . . calculates difference between samples based on the Jaccard and Sørensen methods. This requires a clustering file with more than one sample. This can be created by running Alignment Merger on multiple sample’s alignments and then running Complete Linkage Clustering.

After uploading your *.clust file you will need to specify the range of distance cutoffs (from clustering) at which the Jaccard and Sørensen should be calculated. The step size for distance cutoffs is 0.01 and so using default settings of 0.0 to 0.1 will produce 11 sets of output. Each set of output includes six files. Three files each for Jaccard and Sørensen distances.

  • Text based result matrix for sample distance calculations (Ex. 0.0_jaccard.txt)
  • Tree style organization and representation of distance (Ex. 0.0_jaccard_tree.png)
  • Heat map representation of sample statistics in text file (Ex. 0.0_jaccard.png)
  • Heat map

Diversity Statistics (Shannon and Chao1 Index). . . upload a clustering output file containing one or more samples. The input cluster file name must end with ".clust" extension.

The result . . . is a tab delimited file which contains sampleID, distance, N, clusters, chao, LCI95, UCI95, H', varH, E. file. Import the text file as tabbed delimited within Excel for easy viewing.

Shannon and Chao statistics in excel

