需要进行偏差控制：在基于结构的虚拟筛选中评估机器学习的化学数据。

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.

机构信息

Universität Hamburg , ZBH - Center for Bioinformatics, Research Group for Computational Molecular Design , Bundesstraße 43 , 20146 Hamburg , Germany.

出版信息

J Chem Inf Model. 2019 Mar 25;59(3):947-961. doi: 10.1021/acs.jcim.8b00712. Epub 2019 Mar 5.

DOI:10.1021/acs.jcim.8b00712

PMID:30835112

Abstract

Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated.

摘要

越来越多的报告成功应用机器学习 (ML) 方法于基于结构的虚拟筛选 (SBVS)。例如卷积神经网络等 ML 方法显示出很有前景的结果，并且在回顾性验证中通常优于经验评分函数等传统方法。然而，训练有素的 ML 模型通常被视为黑盒，并且不容易解释。在大多数情况下，尚不清楚数据中的哪些特征是决定性的，以及模型的预测是否是正确的原因。因此，我们重新评估了 ML 方法背景下的三个广泛使用的基准数据集，得出的结论是并非每个基准数据集都适用。此外，我们通过来自当前文献的两个示例证明，偏差是从标准基准中隐含地、不知不觉地学习到的。基于这些结果，我们得出的结论是，需要进行合格的验证实验和适合 ML 的基准数据集，以便在基于 ML 的 SBVS 中进行更受控制的偏差验证。因此，我们提供了设置验证实验的指南，并就如何生成新数据集提供了一些观点。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

需要进行偏差控制：在基于结构的虚拟筛选中评估机器学习的化学数据。

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.

机构信息

出版信息

相似文献

引用本文的文献

需要进行偏差控制：在基于结构的虚拟筛选中评估机器学习的化学数据。

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.

机构信息

出版信息

相似文献

引用本文的文献