Functional Gene Unsupervised Workflow
Supervised processing and analysis of functional gene data starts with the initial processing tool. You may follow this tutorial using your own sequences or you may download and use the input files provided in the Pipeline Initial Processing tutorial. Prepare your input including a sequence file in FASTA or SFF format, a primer sequence and a tag file. Then follow the Pipeline Initial Processing tutorial to begin processing your data.
The sequencing process can introduce insertions and deletions that cause frameshift errors leading to incorrect nucleic and protein sequences. RDPs FrameBot is a tool for detecting and correcting these errors. FrameBot will first dereplicate the sequences from initial processing and then align the sequences to reference sequences. Sequences with low percent identity to reference sequences will be filtered out.
The frameshift corrected, dereplicated sequences from FrameBot need to be aligned before clustering and analysis. For accurate aligning of protein sequences, use the HMMER3 aligner. If your gene is one of the about twenty that RDP already maintains then simply choose your gene from the drop down box on the aligner page. Otherwise, a model can be built for any gene which you have a reliable set of seed sequences for. Contact RDP at firstname.lastname@example.org if you'd like to arrange for a model to be made.
Next we will take the output from the aligner and submit all of the aligned files to the clustering tool as in the Clustering tutorial. When given the option, choose to cluster the sequence files as separate samples.
The output from clustering, a .clust file, is useful for generating a wide range of statistical information about a set of samples. Once your .clust file is ready, you have the option of submitting it to multiple tools on RDP’s website and/or importing it into R for further analysis such as ordination, heat maps, etc. RDP provides four tools that can be used at this point for analysis:
Information on the tools and their use can be found in the clustering results tutorial: Link to the *.clust results tutorial