Orosz Álmos, Héberger Károly, Rácz Anita
Plasma Chemistry Research Group, Research Centre for Natural Sciences, Budapest, Hungary.
Front Chem. 2022 Jun 8;10:852893. doi: 10.3389/fchem.2022.852893. eCollection 2022.
The screening of compounds for ADME-Tox targets plays an important role in drug design. QSPR models can increase the speed of these specific tasks, although the performance of the models highly depends on several factors, such as the applied molecular descriptors. In this study, a detailed comparison of the most popular descriptor groups has been carried out for six main ADME-Tox classification targets: Ames mutagenicity, P-glycoprotein inhibition, hERG inhibition, hepatotoxicity, blood-brain-barrier permeability, and cytochrome P450 2C9 inhibition. The literature-based, medium-sized binary classification datasets (all above 1,000 molecules) were used for the model building by two common algorithms, XGBoost and the RPropMLP neural network. Five molecular representation sets were compared along with their joint applications: Morgan, Atompairs, and MACCS fingerprints, and the traditional 1D and 2D molecular descriptors, as well as 3D molecular descriptors, separately. The statistical evaluation of the model performances was based on 18 different performance parameters. Although all the developed models were close to the usual performance of QSPR models for each specific ADME-Tox target, the results clearly showed the superiority of the traditional 1D, 2D, and 3D descriptors in the case of the XGBoost algorithm. It is worth trying the classical tools in single model building because the use of 2D descriptors can produce even better models for almost every dataset than the combination of all the examined descriptor sets.
对化合物进行药物代谢及毒理学(ADME-Tox)靶点筛选在药物设计中起着重要作用。定量构效关系(QSPR)模型可以提高这些特定任务的速度,尽管模型的性能在很大程度上取决于几个因素,如所应用的分子描述符。在本研究中,针对六个主要的ADME-Tox分类靶点,对最流行的描述符组进行了详细比较:艾姆斯致突变性、P-糖蛋白抑制、人乙醚-去极化相关基因(hERG)抑制、肝毒性、血脑屏障通透性和细胞色素P450 2C9抑制。基于文献的中型二元分类数据集(均超过1000个分子)通过两种常用算法XGBoost和RPropMLP神经网络用于模型构建。比较了五个分子表示集及其联合应用:摩根指纹、原子对指纹和MACCS指纹,以及传统的一维和二维分子描述符,还有单独的三维分子描述符。模型性能的统计评估基于18个不同的性能参数。尽管针对每个特定的ADME-Tox靶点,所有开发的模型都接近QSPR模型的通常性能,但结果清楚地表明,在XGBoost算法的情况下,传统的一维、二维和三维描述符具有优越性。在单模型构建中值得尝试经典工具,因为对于几乎每个数据集,使用二维描述符比所有检查的描述符集的组合能产生更好的模型。