Hou Yujie, Zhang Xiong, Zhou Qinyan, Hong Wenxing, Wang Ying
Department of Automation, Xiamen University, Xiamen, China.
Department of Automation, University of Science and Technology of China, Hefei, China.
Front Genet. 2021 Jan 18;11:608512. doi: 10.3389/fgene.2020.608512. eCollection 2020.
Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled, especially for microbial communities from non-human samples cultured in defective media. Therefore, we herein report the development of Hierarchical micrObial functions Prediction by graph aggregated Embedding (HOPE), which utilizes co-occurring patterns and nucleotide sequences to predict microbial functions. HOPE integrates topological structures of microbial co-occurrence networks with -mer compositions of OTU sequences and embeds them into a lower-dimensional continuous latent space, while maximally preserving topological relationships among OTUs. The high imbalance among KEGG Orthology (KO) functions of microbes is recognized in our framework that usually yields poor performance. A hierarchical multitask learning module is used in HOPE to alleviate the challenge brought by the long-tailed distribution among classes. To test the performance of HOPE, we compare it with HOPE-one, HOPE-seq, and GraphSAGE, respectively, in three microbial metagenomic 16s rRNA sequencing datasets, including abalone gut, human gut, and gut of . Experiments demonstrate that HOPE outperforms baselines on almost all indexes in all experiments. Furthermore, HOPE reveals significant generalization ability. HOPE's basic idea is suitable for other related scenarios, such as the prediction of gene function based on gene co-expression networks. The source code of HOPE is freely available at https://github.com/adrift00/HOPE.
将16S rRNA基因测序数据与代谢参考数据库进行匹配是预测细菌和古菌代谢功能的一种有意义的方法,能为微生物群落的运作带来更深入的见解。然而,一些操作分类单元(OTU)无法进行功能分析,特别是对于在有缺陷培养基中培养的非人类样本的微生物群落。因此,我们在此报告通过图聚合嵌入进行分层微生物功能预测(HOPE)的开发,它利用共现模式和核苷酸序列来预测微生物功能。HOPE将微生物共现网络的拓扑结构与OTU序列的-mer组成整合在一起,并将它们嵌入到一个低维连续潜在空间中,同时最大程度地保留OTU之间的拓扑关系。我们的框架认识到微生物的京都基因与基因组百科全书(KEGG)直系同源(KO)功能之间存在高度不平衡,这通常会导致性能不佳。HOPE中使用了一个分层多任务学习模块来缓解类别间长尾分布带来的挑战。为了测试HOPE的性能,我们分别在三个微生物宏基因组16s rRNA测序数据集(包括鲍鱼肠道、人类肠道和[此处原文缺失物种信息]的肠道)中将其与HOPE-one、HOPE-seq和GraphSAGE进行比较。实验表明,HOPE在所有实验的几乎所有指标上都优于基线。此外,HOPE显示出显著的泛化能力。HOPE的基本思想适用于其他相关场景,例如基于基因共表达网络的基因功能预测。HOPE的源代码可在https://github.com/adrift00/HOPE上免费获取。