A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia.
Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France.
SAR QSAR Environ Res. 2021 Mar;32(3):207-219. doi: 10.1080/1062936X.2021.1883107. Epub 2021 Feb 19.
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
在本文中,我们考虑了反应的定量构效关系模型的交叉验证,并表明传统的 k 折交叉验证(CV)程序对预测性能的评估存在“乐观”偏差。为了解决这个问题,我们提出了两种模型交叉验证策略,即“变换外”CV 和“溶剂外”CV。与不考虑对象性质的传统 k 折交叉验证方法不同,所提出的程序为化学反应中的新型结构转化和新条件下的反应的模型的预测性能提供了无偏估计。这两种策略都被应用于预测双分子消除和亲核取代反应以及 Diels-Alder 环加成的速率常数。所有建议的交叉验证方法和教程都在开源软件包 CIMtools(https://github.com/cimm-kzn/CIMtools)中实现。