Nebel, Markus E. and Wild, Sebastian and Holzhauser, Michael and Hüttenberger, Lars and Reitzig, Raphael and Sperber, Matthias and Stoeck, Thorsten
JAguc – a software package for environmental diversity analyses
Journal of Bioinformatics and Computational Biology,
2011
Background The study of microbial diversity and community structures heavily relies on the analyses of sequence data, predominantly taxonomic marker genes like the small subunit of the ribosomal RNA (SSU rRNA) amplified from environmental samples. Until recently, the “gold standard” for this strategy was the cloning and Sanger sequencing of amplified target genes, usually restricted to a few hundred sequences per sample due to relatively high costs and labor intensity. The recent introduction of massive parallel tag sequencing strategies like pyrosequencing (454 sequencing) has opened a new window into microbial biodiversity research. Due to its swift nature and relatively low expense, this strategy produces millions of environmental SSU rDNA sequences granting the opportunity to gain deep insights into the true diversity and complexity of microbial communities. The bottleneck, however, is the computational processing of these massive sequence data, without which, biologists are hardly able to exploit the full information included in these sequence data.
Results The freely available standalone software package JAguc implements a broad regime of different functions, allowing for efficient and convenient processing of a huge number of sequence tags, including importing custom-made reference data bases for basic local alignment searches, user-defined quality and search filters for analyses of specific sets of sequences, pairwise alignment-based sequence similarity calculations and clustering as well as sampling saturation and rank abundance analyses. In initial applications, JAguc successfully analyzed hundreds of thousands of sequence data (eukaryote SSU rRNA genes) from aquatic samples and also was applied for quality assessments of different pyrosequencing platforms.
Conclusions The new program package JAguc is a tool that bridges the gap between computational and biological sciences. It enables biologists to process large sequence data sets in order to infer biological meaning from hundreds of thousands of raw sequence data. JAguc offers advantages over available tools which are further discussed in this manuscript. While providing a highly efficient implementation of its functionality adjusted to typical molecular environmental diversity analyses, JAguc is not restricted to the analyses of environmental pyrosequencing data but is applicable to a broad array of further applications, including motif searches or (meta)transcriptomes.