Computational Systems Biology Lab. NAIST, Ikoma, 630-0129, Japan.
Mol Inform. 2022 Jul;41(7):e2100247. doi: 10.1002/minf.202100247. Epub 2022 Jan 28.
The plants produce numerous types of secondary metabolites which have pharmacological importance in drug development for different diseases. Computational methods widely use the fingerprints of the metabolites to understand different properties and similarities among metabolites and for the prediction of chemical reactions etc. In this work, we developed three different deep neural network models (DNN) to predict the antibacterial property of plant metabolites. We developed the first DNN model using the fingerprint set of metabolites as features. In the second DNN model, we searched the similarities among fingerprints using correlation and used one representative feature from each group of highly correlated fingerprints. In the third model, the fingerprints of metabolites were used to find structurally similar chemical compound clusters. Form each cluster a representative metabolite is selected and made part of the training dataset. The second model reduced the number of features where the third model achieved better classification results for test data. In both cases, we applied the simple graph clustering method to cluster the corresponding network. The correlation-based DNN model reduced some features while retaining an almost similar performance compared to the first DNN model. The third model improves classification results for test data by capturing wider variance within training data using graph clustering method. This third model is somewhat novel approach and can be applied to build DNN models for other purposes.
植物产生多种类型的次生代谢物,这些代谢物在开发针对不同疾病的药物方面具有药理学意义。计算方法广泛使用代谢物的指纹来了解代谢物之间的不同性质和相似性,并用于预测化学反应等。在这项工作中,我们开发了三种不同的深度神经网络模型(DNN)来预测植物代谢物的抗菌特性。我们使用代谢物的指纹集作为特征开发了第一个 DNN 模型。在第二个 DNN 模型中,我们使用相关性搜索指纹之间的相似性,并从高度相关的指纹组中使用一个代表特征。在第三个模型中,使用代谢物的指纹找到结构相似的化学化合物簇。从每个簇中选择一个代表性的代谢物,并将其作为训练数据集的一部分。第二个模型减少了特征的数量,而第三个模型在测试数据上实现了更好的分类结果。在这两种情况下,我们都应用了简单的图聚类方法对相应的网络进行聚类。基于相关性的 DNN 模型在保留与第一个 DNN 模型几乎相似的性能的同时,减少了一些特征。第三个模型通过使用图聚类方法捕获训练数据中的更广泛变化,提高了测试数据的分类结果。这种第三个模型是一种新颖的方法,可以应用于构建用于其他目的的 DNN 模型。