Running the Defined Community Analysis
Folders for each tag file containing:
The Defined Community Analysis tool main page can be found at http://fungene.cme.msu.edu/FunGenePipeline/error_analysis/form.spr.
This tool compares input nucleotide reads to the set of known sequences for amplification targets in the sequenced DNA. It determines the numbers and types of errors present in the reads. It may also help determine appropriate quality filters for the dataset from the same sequencing run.
It requires a sequence file and a reference sequence file. The required sequence file, obtained from the sequencing center, can be in FASTA format or FASTQ Format (which contains both the sequence and quality information). The defined community reads and the reference sequences should cover the same region of the gene. If not, you can trim reference sequences to the amplicon region by using the Initial Processing Tool with corresponding forward and reverse primers.
It's neccessary to check the existence of chimera (by UCHIME) or contamination in your input reads (by SeqMatch or BLAST). See our publication "FunGene: the functional gene pipeline and repository"
Download the sample input files . . . for this tutorial -- the sample input zip file contains the following four files:
* a FASTQ sequence file (mid01_trimmed.fastq)
* a reference sequence file (nifH_control_refseq_nucl_slice.fa)
Uploading your data to the web interface.
Output . . . contains the following files (download sample output zip file):
Error rate of read by Read Q Score
Percentage of sequences with certain number of errors
Percentage of sequences matching each defined community organism by read Q score