Suppr超能文献

大多数基于配体的分类基准更奖励记忆而不是泛化。

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.

机构信息

Atomwise Inc. , 221 Main Street, Suite 1350 , San Francisco , California 94105 , United States.

出版信息

J Chem Inf Model. 2018 May 29;58(5):916-932. doi: 10.1021/acs.jcim.7b00403. Epub 2018 May 8.

Abstract

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems, that accounts for the similarity among inactive molecules as well as active ones. We investigated seven widely used benchmarks for virtual screening and classification, and we show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously applied unbiasing techniques. Therefore, it may be the case that the previously reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

摘要

当训练数据和验证数据之间存在显著冗余时,可能会出现未被发现的过拟合。我们描述了AVE,这是一种用于基于配体的分类问题的新的训练-验证冗余度量方法,它考虑了无活性分子和活性分子之间的相似性。我们研究了七个广泛用于虚拟筛选和分类的基准测试,结果表明,AVE 偏差的数量与基于配体的预测方法的性能密切相关,无论预测的性质、化学指纹、相似性度量还是先前应用的去偏技术如何。因此,可能的情况是,以前报道的大多数基于配体的方法的性能可以通过过度拟合基准来解释,而不是良好的前瞻性准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验