Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.
Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.
Bioinformatics. 2021 May 5;37(6):822-829. doi: 10.1093/bioinformatics/btaa906.
Metabolic pathway reconstruction from genomic sequence information is a key step in predicting regulatory and functional potential of cells at the individual, population and community levels of organization. Although the most common methods for metabolic pathway reconstruction are gene-centric e.g. mapping annotated proteins onto known pathways using a reference database, pathway-centric methods based on heuristics or machine learning to infer pathway presence provide a powerful engine for hypothesis generation in biological systems. Such methods rely on rule sets or rich feature information that may not be known or readily accessible.
Here, we present pathway2vec, a software package consisting of six representational learning modules used to automatically generate features for pathway inference. Specifically, we build a three-layered network composed of compounds, enzymes and pathways, where nodes within a layer manifest inter-interactions and nodes between layers manifest betweenness interactions. This layered architecture captures relevant relationships used to learn a neural embedding-based low-dimensional space of metabolic features. We benchmark pathway2vec performance based on node-clustering, embedding visualization and pathway prediction using MetaCyc as a trusted source. In the pathway prediction task, results indicate that it is possible to leverage embeddings to improve prediction outcomes.
The software package and installation instructions are published on http://github.com/pathway2vec.
Supplementary data are available at Bioinformatics online.
从基因组序列信息中重建代谢途径是预测个体、群体和群落水平的细胞调控和功能潜力的关键步骤。虽然代谢途径重建最常用的方法是基于基因的,例如使用参考数据库将注释蛋白映射到已知途径上,但基于启发式或机器学习的途径中心方法来推断途径存在为生物系统中的假设生成提供了强大的引擎。这些方法依赖于规则集或丰富的特征信息,这些信息可能未知或不易获得。
在这里,我们提出了 pathway2vec,这是一个软件包,由六个表示学习模块组成,用于自动生成途径推断的特征。具体来说,我们构建了一个由化合物、酶和途径组成的三层网络,其中一层的节点表现出内部相互作用,层之间的节点表现出中间相互作用。这种分层架构捕获了用于学习基于神经网络的代谢特征低维空间的相关关系。我们基于节点聚类、嵌入可视化和使用 MetaCyc 作为可信源的途径预测来评估 pathway2vec 的性能。在途径预测任务中,结果表明可以利用嵌入来提高预测结果。
软件包和安装说明发布在 http://github.com/pathway2vec 上。
补充数据可在《生物信息学》在线获取。