Suppr超能文献

基于生物合成途径起始物质的生物碱分类:使用图卷积神经网络。

Classification of alkaloids according to the starting substances of their biosynthetic pathways using graph convolutional neural networks.

机构信息

Division of Science and Technology, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Nara, 630-0192, Japan.

Data Science Center, Nara Institute of Science and Technology, Ikoma, Nara, 630-0192, Japan.

出版信息

BMC Bioinformatics. 2019 Jul 9;20(1):380. doi: 10.1186/s12859-019-2963-6.

Abstract

BACKGROUND

Alkaloids, a class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities. Although there are thousands of compounds in this class, few of their biosynthesis pathways are fully identified. In this study, we constructed a model to predict their precursors based on a novel kind of neural network called the molecular graph convolutional neural network. Molecular similarity is a crucial metric in the analysis of qualitative structure-activity relationships. However, it is sometimes difficult for current fingerprint representations to emphasize specific features for the target problems efficiently. It is advantageous to allow the model to select the appropriate features according to data-driven decisions for extracting more useful information, which influences a classification or regression problem substantially.

RESULTS

In this study, we applied a neural network architecture for undirected graph representation of molecules. By encoding a molecule as an abstract graph and applying "convolution" on the graph and training the weight of the neural network framework, the neural network can optimize feature selection for the training problem. By incorporating the effects from adjacent atoms recursively, graph convolutional neural networks can extract the features of latent atoms that represent chemical features of a molecule efficiently. In order to investigate alkaloid biosynthesis, we trained the network to distinguish the precursors of 566 alkaloids, which are almost all of the alkaloids whose biosynthesis pathways are known, and showed that the model could predict starting substances with an averaged accuracy of 97.5%.

CONCLUSION

We have showed that our model can predict more accurately compared to the random forest and general neural network when the variables and fingerprints are not selected, while the performance is comparable when we carefully select 507 variables from 18000 dimensions of descriptors. The prediction of pathways contributes to understanding of alkaloid synthesis mechanisms and the application of graph based neural network models to similar problems in bioinformatics would therefore be beneficial. We applied our model to evaluate the precursors of biosynthesis of 12000 alkaloids found in various organisms and found power-low-like distribution.

摘要

背景

生物碱是一类含有氮碱基的有机化合物,主要作为植物和真菌中的次生代谢物合成,具有广泛的生物活性。尽管这类化合物有数千种,但它们的生物合成途径很少被完全确定。在这项研究中,我们构建了一个基于新型神经网络——分子图卷积神经网络——的预测模型,用于预测它们的前体。分子相似性是分析定性构效关系的一个关键度量。然而,当前的指纹表示有时难以有效地为目标问题强调特定特征。让模型根据数据驱动的决策选择合适的特征对于提取更有用的信息是有利的,这会对分类或回归问题产生重大影响。

结果

在这项研究中,我们应用了一种神经网络架构来表示分子的无向图。通过将分子编码为抽象图,并在图上应用“卷积”并训练神经网络框架的权重,神经网络可以针对训练问题优化特征选择。通过递归地合并相邻原子的影响,图卷积神经网络可以有效地提取代表分子化学特征的潜在原子的特征。为了研究生物碱的生物合成,我们训练网络来区分 566 种生物碱的前体,这些生物碱几乎都是其生物合成途径已知的生物碱,结果表明该模型可以以平均准确率 97.5%预测起始物质。

结论

我们表明,与随机森林和通用神经网络相比,当不选择变量和指纹时,我们的模型可以更准确地预测,而当我们从 18000 维描述符中精心选择 507 个变量时,性能相当。通路的预测有助于理解生物碱的合成机制,因此将基于图的神经网络模型应用于生物信息学中的类似问题将是有益的。我们将我们的模型应用于评估各种生物体中发现的 12000 种生物碱的生物合成前体,发现了幂律分布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e80/6617615/24c33eb70dcb/12859_2019_2963_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验