Sharma Ashok K, Gupta Ankit, Kumar Sanjiv, Dhakan Darshan B, Sharma Vineet K
MetaInformatics Laboratory, Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh, India.
Genomics. 2015 Jul;106(1):1-6. doi: 10.1016/j.ygeno.2015.04.001. Epub 2015 Apr 8.
Functional annotation of the gigantic metagenomic data is one of the major time-consuming and computationally demanding tasks, which is currently a bottleneck for the efficient analysis. The commonly used homology-based methods to functionally annotate and classify proteins are extremely slow. Therefore, to achieve faster and accurate functional annotation, we have developed an orthology-based functional classifier 'Woods' by using a combination of machine learning and similarity-based approaches. Woods displayed a precision of 98.79% on independent genomic dataset, 96.66% on simulated metagenomic dataset and >97% on two real metagenomic datasets. In addition, it performed >87 times faster than BLAST on the two real metagenomic datasets. Woods can be used as a highly efficient and accurate classifier with high-throughput capability which facilitates its usability on large metagenomic datasets.
对海量宏基因组数据进行功能注释是一项耗时且计算量极大的主要任务,目前这是高效分析的一个瓶颈。常用的基于同源性的蛋白质功能注释和分类方法极其缓慢。因此,为了实现更快、更准确的功能注释,我们通过结合机器学习和基于相似性的方法,开发了一种基于直系同源的功能分类器“Woods”。Woods在独立基因组数据集上的精度为98.79%,在模拟宏基因组数据集上为96.66%,在两个真实宏基因组数据集上大于97%。此外,在两个真实宏基因组数据集上,它的运行速度比BLAST快87倍以上。Woods可作为一种具有高通量能力的高效、准确的分类器,便于在大型宏基因组数据集上使用。