Suppr超能文献

ChemStable:一个用于基于规则的朴素贝叶斯学习方法预测化合物稳定性的网络服务器。

ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability.

作者信息

Liu Zhihong, Zheng Minghao, Yan Xin, Gu Qiong, Gasteiger Johann, Tijhuis Johan, Maas Peter, Li Jiabo, Xu Jun

机构信息

Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, 132 East Circle at University City, Guangzhou, 510006, China.

出版信息

J Comput Aided Mol Des. 2014 Sep;28(9):941-50. doi: 10.1007/s10822-014-9778-3. Epub 2014 Jul 17.

Abstract

Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.

摘要

预测化合物的化学稳定性很重要,因为不稳定的化合物可能会在生物测定中导致假阳性或假阴性结论。通过应用基于原子中心片段(ACF)特征的规则嵌入式朴素贝叶斯学习方法,利用在50°C下储存105天的二甲基亚砜/水溶液测量得到的实验数据(COMDECOM)来预测稳定性。为了构建朴素贝叶斯分类器,我们从COMDECOM数据集中的9746种化合物中提取了ACF特征。通过对数据集递归应用朴素贝叶斯学习,为每个ACF分配一个预期的稳定概率(p(s))和一个不稳定概率(p(uns))。13340个ACF及其p(s)和p(uns)数据被存储在一个知识库中,供贝叶斯分类器使用。对于给定的化合物,其ACF是从其结构连接表中按照与从训练数据中提取ACF相同的协议推导出来的。然后,贝叶斯分类器通过内部实现的结构模式识别算法为化合物的ACF分配p(s)和p(uns)值。基于化合物ACF的p(s)和p(uns)值,利用贝叶斯定理计算化合物的不稳定性。我们能够实现AUC值为84%、十倍交叉验证准确率为76.5%的性能。为了减少假阴性,分类器中嵌入了一种基于规则的方法。基于规则的模块允许程序通过扩展其化合物不稳定性知识库来提高其预测能力,从而进一步降低假阴性的可能性。据我们所知,这是第一个用于预测有机化合物稳定性的计算机模拟预测服务。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验