Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India.
Mol Divers. 2022 Oct;26(5):2847-2862. doi: 10.1007/s11030-022-10478-6. Epub 2022 Jun 29.
Quantitative structure-activity relationship (QSAR) and read-across techniques have recently been merged into a new emerging field of read-across structure-activity relationship (RASAR) that uses the chemical similarity concepts of read-across (an unsupervised step) and finally develops a supervised learning model (like QSAR). The RASAR method has so far been used only in case of graded predictions or classification modeling. In this work, we attempt, for the first time, to apply RASAR for quantitative predictions (q-RASAR) using a case study of androgen receptor binding affinity data. We have computed a number of error-based and similarity-based measures such as weighted standard deviation of the predicted values, coefficient of variation of the computed predictions, average similarity level of close training compounds for each query molecule, standard deviation and coefficient of variation of similarity levels, maximum similarity levels to positive and negative close training compounds, a concordance measure indicating similarity to positive, negative or both classes of close training compounds, etc. We have clubbed these additional measures along with the selected chemical descriptors from the previously developed QSAR model and redeveloped new partial least squares models from the training set, and predicted the endpoint using the query data set. Interestingly, these new models outperform the internal and external validation quality of the original QSAR model. In this study, we have also introduced a new similarity-based concordance measure (Banerjee-Roy coefficient) that can significantly contribute to the model quality. A q-RASAR model also has the advantage over read-across predictions in providing easy interpretation and indicating quantitative contributions of important chemical features. The strategy described here should be applicable to other biological/toxicological/property data modeling for enhanced quality of predictions, easy interpretability, and efficient transferability.
定量构效关系(QSAR)和读值预测技术最近已合并为一个新兴的读值预测结构活性关系(RASAR)领域,该领域利用读值预测的化学相似性概念(无监督步骤),并最终开发出监督学习模型(如 QSAR)。到目前为止,RASAR 方法仅用于分级预测或分类建模。在这项工作中,我们首次尝试使用雄激素受体结合亲和力数据的案例研究,将 RASAR 应用于定量预测(q-RASAR)。我们计算了一些基于误差和基于相似性的度量,例如预测值的加权标准偏差、计算预测值的变异系数、每个查询分子的近训练化合物的平均相似水平、相似水平的标准偏差和变异系数、与正、负近训练化合物的最大相似水平、指示与正、负或两类近训练化合物相似的一致性度量等。我们将这些额外的度量与之前开发的 QSAR 模型中选择的化学描述符结合起来,并使用训练集重新开发新的偏最小二乘模型,然后使用查询数据集预测终点。有趣的是,这些新模型的性能优于原始 QSAR 模型的内部和外部验证质量。在这项研究中,我们还引入了一种新的基于相似性的一致性度量(Banerjee-Roy 系数),它可以显著提高模型质量。q-RASAR 模型还具有优于读值预测的优势,可提供易于解释和指示重要化学特征的定量贡献。这里描述的策略应该适用于其他生物/毒理学/性质数据建模,以提高预测质量、易于解释性和高效可转移性。