Feldbauer Roman, Gosch Lukas, Lüftinger Lukas, Hyden Patrick, Flexer Arthur, Rattei Thomas
Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria.
Ares Genetics GmbH, Vienna 1030, Austria.
Bioinformatics. 2021 Apr 1;36(22-23):5304-5312. doi: 10.1093/bioinformatics/btaa1051.
Protein orthologous group databases are powerful tools for evolutionary analysis, functional annotation or metabolic pathway modeling across lineages. Sequences are typically assigned to orthologous groups with alignment-based methods, such as profile hidden Markov models, which have become a computational bottleneck.
We present DeepNOG, an extremely fast and accurate, alignment-free orthology assignment method based on deep convolutional networks. We compare DeepNOG against state-of-the-art alignment-based (HMMER, DIAMOND) and alignment-free methods (DeepFam) on two orthology databases (COG, eggNOG 5). DeepNOG can be scaled to large orthology databases like eggNOG, for which it outperforms DeepFam in terms of precision and recall by large margins. While alignment-based methods still provide the most accurate assignments among the investigated methods, computing time of DeepNOG is an order of magnitude lower on CPUs. Optional GPU usage further increases throughput massively. A command-line tool enables rapid adoption by users.
Source code and packages are freely available at https://github.com/univieCUBE/deepnog. Install the platform-independent Python program with $pip install deepnog.
Supplementary data are available at Bioinformatics online.
蛋白质直系同源组数据库是跨谱系进行进化分析、功能注释或代谢途径建模的强大工具。序列通常使用基于比对的方法(如轮廓隐马尔可夫模型)分配到直系同源组中,而这些方法已成为计算瓶颈。
我们提出了DeepNOG,这是一种基于深度卷积网络的极其快速且准确的、无需比对的直系同源分配方法。我们在两个直系同源数据库(COG、eggNOG 5)上,将DeepNOG与基于比对的最先进方法(HMMER、DIAMOND)和无需比对的方法(DeepFam)进行了比较。DeepNOG可以扩展到像eggNOG这样的大型直系同源数据库,在精度和召回率方面,它比DeepFam有大幅提升。虽然在研究的方法中,基于比对的方法仍然能提供最准确的分配,但DeepNOG在CPU上的计算时间要低一个数量级。可选的GPU使用进一步大幅提高了吞吐量。一个命令行工具便于用户快速采用。
源代码和软件包可在https://github.com/univieCUBE/deepnog免费获取。使用$pip install deepnog安装独立于平台的Python程序。
补充数据可在《生物信息学》在线获取。