Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Center for Bioinformatics, University of Tübingen, 72076 Tübingen, Germany and Life Sciences Institute, National University of Singapore, Singapore 117456.
Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.
In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles.
PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website.
在宏基因组学背景下,我们提出了一种新的蛋白质数据库搜索方法,称为 PAUDA,它的运行速度比 BLASTX 快约 10000 倍,而将reads 分配到 KEGG 直系同源群的比例约为其三分之一,并生成与 BLASTX 获得的高度相关的基因和分类群丰度谱。PAUDA 分析 24600 万条来自永久冻土土壤的 Illumina DNA reads 的数据集仅需 <80 CPU 小时,而之前的 BLASTX 分析(在 1.76 亿条reads 的一个子集上)则需要 800000 CPU 小时,从而导致功能谱对样本进行相同的聚类。
PAUDA 可从以下网址免费获得:http://ab.inf.uni-tuebingen.de/software/pauda。此外,该网站还提供了补充方法的详细信息。