Yang Ying, Jiang Xiao-Tao, Zhang Tong
Environmental Biotechnology Laboratory, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China.
PLoS One. 2014 Oct 27;9(10):e110947. doi: 10.1371/journal.pone.0110947. eCollection 2014.
The fast development of next generation sequencing (NGS) has dramatically increased the application of metagenomics in various aspects. Functional annotation is a major step in the metagenomics studies. Fast annotation of functional genes has been a challenge because of the deluge of NGS data and expanding databases. A hybrid annotation pipeline proposed previously for taxonomic assignments was evaluated in this study for metagenomic sequences annotation of specific functional genes, such as antibiotic resistance genes, arsenic resistance genes and key genes in nitrogen metabolism. The hybrid approach using UBLAST and BLASTX is 44-177 times faster than direct BLASTX in the annotation using the small protein database for the specific functional genes, with the cost of missing a small portion (<1.8%) of target sequences compared with direct BLASTX hits. Different from direct BLASTX, the time required for specific functional genes annotation using the hybrid annotation pipeline depends on the abundance for the target genes. Thus this hybrid annotation pipeline is more suitable in specific functional genes annotation than in comprehensive functional genes annotation.
下一代测序(NGS)的快速发展极大地增加了宏基因组学在各个方面的应用。功能注释是宏基因组学研究中的一个主要步骤。由于NGS数据的大量涌现和数据库的不断扩大,快速注释功能基因一直是一项挑战。本研究评估了先前提出的用于分类学分配的混合注释流程,用于特定功能基因的宏基因组序列注释,如抗生素抗性基因、砷抗性基因和氮代谢中的关键基因。在使用小蛋白质数据库对特定功能基因进行注释时,使用UBLAST和BLASTX的混合方法比直接使用BLASTX快44 - 177倍,与直接BLASTX命中相比,代价是遗漏一小部分(<1.8%)的目标序列。与直接BLASTX不同,使用混合注释流程对特定功能基因进行注释所需的时间取决于目标基因的丰度。因此,这种混合注释流程比全面功能基因注释更适用于特定功能基因注释。