Tosca Elena M, Bartolucci Roberta, Magni Paolo
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, I-27100 Pavia, Italy.
Pharmaceutics. 2021 Jul 20;13(7):1101. doi: 10.3390/pharmaceutics13071101.
Machine learning (ML) approaches are receiving increasing attention from pharmaceutical companies and regulatory agencies, given their ability to mine knowledge from available data. In drug discovery, for example, they are employed in quantitative structure-property relationship (QSPR) models to predict biological properties from the chemical structure of a drug molecule. In this paper, following the Second Solubility Challenge (SC-2), a QSPR model based on artificial neural networks (ANNs) was built to predict the intrinsic solubility () of the 100-compound low-variance tight set and the 32-compound high-variance loose set provided by SC-2 as test datasets. First, a training dataset of 270 drug-like molecules with value experimentally determined was gathered from the literature. Then, a standard three-layer feed-forward neural network was defined by using 10 ChemGPS physico-chemical descriptors as input features. The developed ANN showed adequate predictive performances on both of the SC-2 test datasets. Benefits and limitations of ML approaches have been highlighted and discussed, starting from this case-study. The main findings confirmed that ML approaches are an attractive and promising tool to predict ; however, many aspects, such as data quality, molecular descriptor computation and selection, and assessment of applicability domain, are crucial but often neglected, and should be carefully considered to improve predictions based on ML.
鉴于机器学习(ML)方法能够从现有数据中挖掘知识,制药公司和监管机构对其的关注与日俱增。例如,在药物研发中,这些方法被应用于定量构效关系(QSPR)模型,以根据药物分子的化学结构预测其生物学特性。在本文中,继第二次溶解度挑战(SC - 2)之后,构建了一个基于人工神经网络(ANN)的QSPR模型,用于预测SC - 2提供的作为测试数据集的100化合物低方差紧密集和32化合物高方差宽松集的固有溶解度()。首先,从文献中收集了一个包含270个类药物分子的训练数据集,其值已通过实验确定。然后,使用10个ChemGPS物理化学描述符作为输入特征定义了一个标准的三层前馈神经网络。所开发的人工神经网络在两个SC - 2测试数据集上均表现出足够的预测性能。从这个案例研究出发,ML方法的优点和局限性已得到突出和讨论。主要研究结果证实,ML方法是预测的一种有吸引力且很有前景的工具;然而,许多方面,如数据质量、分子描述符的计算和选择以及适用域的评估,虽然至关重要但常常被忽视,为了改进基于ML的预测,应该仔细考虑这些方面。
J Mol Graph Model. 2023-6
J Mol Graph Model. 2021-7
J Chem Inf Model. 2007
J Comput Chem. 2003-5
Sci Data. 2024-3-18
J Pharmacokinet Pharmacodyn. 2024-4
Pharmaceutics. 2022-10-21
ADMET DMPK. 2019-4-5
ADMET DMPK. 2020-6-15
iScience. 2020-12-17
J Chem Inf Model. 2020-10-26
Mol Inform. 2019-4-4
J Cheminform. 2017-12-13
Acta Pharm Sin B. 2015-9