药物发现中机器学习的预测局限性。

Limits of Prediction for Machine Learning in Drug Discovery.

作者信息

von Korff Modest, Sander Thomas

机构信息

Idorsia Pharmaceuticals Ltd., Allschwil, Switzerland.

出版信息

Front Pharmacol. 2022 Mar 10;13:832120. doi: 10.3389/fphar.2022.832120. eCollection 2022.

In drug discovery, molecules are optimized towards desired properties. In this context, machine learning is used for extrapolation in drug discovery projects. The limits of extrapolation for regression models are known. However, a systematic analysis of the effectiveness of extrapolation in drug discovery has not yet been performed. In response, this study examined the capabilities of six machine learning algorithms to extrapolate from 243 datasets. The response values calculated from the molecules in the datasets were molecular weight, cLogP, and the number of sp3-atoms. Three experimental set ups were chosen for response values. Shuffled data were used for interpolation, whereas data for extrapolation were sorted from high to low values, and the reverse. Extrapolation with sorted data resulted in much larger prediction errors than extrapolation with shuffled data. Additionally, this study demonstrated that linear machine learning methods are preferable for extrapolation.

在药物研发中，分子会针对所需特性进行优化。在此背景下，机器学习被用于药物研发项目中的外推。回归模型外推的局限性是已知的。然而，尚未对药物研发中外推的有效性进行系统分析。作为回应，本研究考察了六种机器学习算法从243个数据集进行外推的能力。从数据集中的分子计算出的响应值为分子量、cLogP和sp3原子数。针对响应值选择了三种实验设置。打乱的数据用于内插，而外推的数据则按值从高到低排序，反之亦然。与使用打乱数据的外推相比，使用排序数据的外推导致的预测误差要大得多。此外，本研究表明线性机器学习方法更适合外推。