• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度学习的序列结合预测中交叉验证策略的评估。

Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.

机构信息

B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial , Universitat Politècnica de Catalunya , 08028 Barcelona , Spain.

Mind the Byte S.L. , 08007 Barcelona , Spain.

出版信息

J Chem Inf Model. 2019 Apr 22;59(4):1645-1657. doi: 10.1021/acs.jcim.8b00663. Epub 2019 Feb 22.

DOI:10.1021/acs.jcim.8b00663
PMID:30730731
Abstract

Binding prediction between targets and drug-like compounds through deep neural networks has generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database, and (4) splitting based both in the clustering and in the source database. These schemas are applied to a deep learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our deep learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compound clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.

摘要

通过深度神经网络进行药物与靶点的结合预测在近年来取得了很有前景的成果,其表现优于传统的基于机器学习的方法。然而,这些分类模型的泛化能力仍然是一个需要解决的问题。在这项工作中,我们探讨了应用于不同分子数据库的数据的不同交叉验证策略如何影响结合预测的药效组学模型的性能。这些策略是:(1)随机分割,(2)基于 K-均值聚类的分割(同时包括活性和非活性化合物),(3)基于源数据库的分割,以及(4)基于聚类和源数据库的分割。这些方案应用于深度学习药效组学模型和简单的逻辑回归模型作为基线。此外,还测试了模型中分子的两种不同描述方式:(1)SMILES 和(2)三种指纹。我们基于深度学习的药效组学模型的分类性能可与现有技术相媲美。我们的结果表明,这些模型的泛化能力不足是由于公共分子数据库中的偏差所致,并且基于化合物聚类的限制交叉验证方案会导致更差但更稳健和可信的结果。当用指纹来表示分子时,我们的结果显示出更好的性能。

相似文献

1
Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.基于深度学习的序列结合预测中交叉验证策略的评估。
J Chem Inf Model. 2019 Apr 22;59(4):1645-1657. doi: 10.1021/acs.jcim.8b00663. Epub 2019 Feb 22.
2
A Deep Learning-Based Chemical System for QSAR Prediction.基于深度学习的定量构效关系预测化学系统。
IEEE J Biomed Health Inform. 2020 Oct;24(10):3020-3028. doi: 10.1109/JBHI.2020.2977009. Epub 2020 Feb 28.
3
Proteochemometrics - recent developments in bioactivity and selectivity modeling.药物化学计量学——生物活性和选择性建模的最新进展。
Drug Discov Today Technol. 2019 Dec;32-33:89-98. doi: 10.1016/j.ddtec.2020.08.003. Epub 2020 Sep 20.
4
Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules.从分子中学习 SMILES:基于 BAN 的策略来改进潜在表示学习。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab327.
5
DL-SMILES#: A Novel Encoding Scheme for Predicting Compound Protein Affinity Using Deep Learning.DL-SMILES#:一种使用深度学习预测化合物蛋白亲和力的新型编码方案。
Comb Chem High Throughput Screen. 2022;25(4):642-650. doi: 10.2174/1386207324666210219102728.
6
Convolutional neural network based on SMILES representation of compounds for detecting chemical motif.基于化合物 SMILES 表示的卷积神经网络用于检测化学基序。
BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):526. doi: 10.1186/s12859-018-2523-5.
7
Deep Learning-Based Modeling of Drug-Target Interaction Prediction Incorporating Binding Site Information of Proteins.基于深度学习的药物-靶标相互作用预测模型,纳入蛋白质结合位点信息。
Interdiscip Sci. 2023 Jun;15(2):306-315. doi: 10.1007/s12539-023-00557-z. Epub 2023 Mar 26.
8
Shallow Representation Learning via Kernel PCA Improves QSAR Modelability.通过核主成分分析的浅层表示学习提高定量构效关系模型能力。
J Chem Inf Model. 2017 Aug 28;57(8):1859-1867. doi: 10.1021/acs.jcim.6b00694. Epub 2017 Aug 7.
9
A comprehensive support vector machine binary hERG classification model based on extensive but biased end point hERG data sets.基于广泛但存在偏倚的终点 hERG 数据集的全面支持向量机二进制 hERG 分类模型。
Chem Res Toxicol. 2011 Jun 20;24(6):934-49. doi: 10.1021/tx200099j. Epub 2011 May 6.
10
Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models.新型共识架构可提高大规模多任务深度学习 QSAR 模型的性能。
J Chem Inf Model. 2019 Nov 25;59(11):4613-4624. doi: 10.1021/acs.jcim.9b00526. Epub 2019 Oct 25.

引用本文的文献

1
The Role of Deep Cerebral Tracts in Predicting Postoperative Aphasia: An nTMS-Based Investigation of the Corticothalamic Fibers.大脑深部传导束在预测术后失语中的作用:基于nTMS对皮质丘脑纤维的研究
Hum Brain Mapp. 2025 Sep;46(13):e70344. doi: 10.1002/hbm.70344.
2
Balancing Data on Deep Learning-Based Proteochemometric Activity Classification.基于深度学习的定量构效活性分类的数据平衡。
J Chem Inf Model. 2021 Apr 26;61(4):1657-1669. doi: 10.1021/acs.jcim.1c00086. Epub 2021 Mar 29.
3
The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms.
在使用机器学习算法的四种压疮预测模型中,随机森林模型具有最高的准确率。
Risk Manag Healthc Policy. 2021 Mar 18;14:1175-1187. doi: 10.2147/RMHP.S297838. eCollection 2021.
4
The effect of statistical normalization on network propagation scores.统计归一化对网络传播评分的影响。
Bioinformatics. 2021 May 5;37(6):845-852. doi: 10.1093/bioinformatics/btaa896.
5
Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction.序列填充对深度学习模型在古菌蛋白功能预测中的性能的影响。
Sci Rep. 2020 Sep 3;10(1):14634. doi: 10.1038/s41598-020-71450-8.
6
Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.用于基于结构的药物设计的三维卷积神经网络和交叉对接数据集
J Chem Inf Model. 2020 Sep 28;60(9):4200-4215. doi: 10.1021/acs.jcim.0c00411. Epub 2020 Sep 10.
7
STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products.STarFish:一种堆叠集成目标捕捞方法及其在天然产物中的应用。
J Chem Inf Model. 2019 Nov 25;59(11):4906-4920. doi: 10.1021/acs.jcim.9b00489. Epub 2019 Oct 24.