Suppr超能文献

利用集成学习提高磷脂蓄积诱导潜力预测。

Exploiting ensemble learning to improve prediction of phospholipidosis inducing potential.

机构信息

Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur 492001, India.

Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur 492001, India.

出版信息

J Theor Biol. 2019 Oct 21;479:37-47. doi: 10.1016/j.jtbi.2019.07.009. Epub 2019 Jul 13.

Abstract

Phospholipidosis is characterized by the presence of excessive accumulation of phospholipids in different tissue types (lungs, liver, eyes, kidneys etc.) caused by cationic amphiphilic drugs. Electron microscopy analysis has revealed the presence of lamellar inclusion bodies as the hallmark of phospholipidosis. Some phospholipidosis causing compounds can cause tissue specific inflammatory/retrogressive changes. Reliable and accurate in silico methods could facilitate early screening of phospholipidosis inducing compounds which can subsequently speed up the pharmaceutical drug discovery pipelines. In the present work, stacking ensembles are implemented for combining a number of different base learners to develop predictive models (a total of 256 trained machine learning models were tested) for phospholipidosis inducing compounds using a wide range of molecular descriptors (ChemMine, JOELib, Open babel and RDK descriptors) and structural alerts as input features. The best model consisting of stacked ensemble of machine learning algorithms with random forest as the second level learner outperformed other base and ensemble learners. JOELib descriptors along with structural alerts performed better than the other types of descriptor sets. The best ensemble model achieved an overall accuracy of 88.23%, sensitivity of 86.27%, specificity of 90.20%, mcc of 0.765, auc of 0.896 with 88.21 g-means. To assess the robustness and stability of the best ensemble model, it is further evaluated using stratified 10×10 fold cross validation and holdout testing sets (repeated 10 times) achieving 84.83% mean accuracy with 0.708 mean mcc and 88.46% mean accuracy with 0.771 mean mcc respectively. A comparison of different meta classifiers (Generalized linear regression, Gradient boosting machines, Random forest and Deep learning neural networks) in stacking ensemble revealed that random forest is the better choice for combining multiple classification models.

摘要

磷脂蓄积症的特征是阳离子两亲性药物导致不同组织类型(肺、肝、眼、肾等)中磷脂过度蓄积。电子显微镜分析显示层状包涵体是磷脂蓄积症的标志。一些引起磷脂蓄积症的化合物可引起组织特异性炎症/退行性变化。可靠和准确的计算方法可以促进早期筛选引起磷脂蓄积症的化合物,从而加速药物发现的药物研发管道。在本工作中,采用堆叠集成方法将多种不同的基本学习器组合起来,使用广泛的分子描述符(ChemMine、JOELib、Open babel 和 RDK 描述符)和结构警报作为输入特征,为磷脂蓄积症化合物开发预测模型(总共测试了 256 个训练机器学习模型)。由随机森林作为二级学习者的机器学习算法堆叠集成的最佳模型优于其他基础和集成学习者。JOELib 描述符与结构警报的性能优于其他类型的描述符集。最佳集成模型的整体准确率为 88.23%,灵敏度为 86.27%,特异性为 90.20%,mcc 为 0.765,auc 为 0.896,g-均值为 88.21。为了评估最佳集成模型的稳健性和稳定性,进一步使用分层 10×10 折交叉验证和预留测试集(重复 10 次)进行评估,平均准确率为 84.83%,平均 mcc 为 0.708,平均准确率为 88.46%,平均 mcc 为 0.771。在堆叠集成中比较不同的元分类器(广义线性回归、梯度提升机、随机森林和深度学习神经网络)发现,随机森林是组合多个分类模型的较好选择。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验