Banerjee Arkaprava, Roy Kunal
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
Environ Sci Process Impacts. 2025 May 21;27(5):1229-1243. doi: 10.1039/d5em00068h.
The continuous quest for the quick, accurate, and efficient methods for filling the gaps in the toxicity data of commercial chemicals is the need of the hour. Thus, it has become essential to develop simple and improved modeling strategies that aim to generate more accurate predictions. Recently, quantitative Read-Across Structure-Activity Relationship (q-RASAR) modeling has been reported to enhance the external predictivity of QSAR models. However, the cross-validation metrics of some q-RASAR models show compromised values compared to those of the corresponding QSAR models. We report here an improved q-RASAR workflow coupled with the Arithmetic Residuals in -groups Analysis (ARKA) framework. This improved workflow (ARKA-RASAR) considers two important aspects: the contribution of different QSAR descriptors to different experimental response ranges, and the identification of similarity among close congeners based on both the selected QSAR descriptors and the contribution of different QSAR descriptors to different experimental response ranges. A simple, free, and user-friendly Java-based tool, Multiclass ARKA-v1.0, has been developed to compute the multiclass ARKA descriptors. In this study, five different toxicity datasets previously used for the development of QSAR and q-RASAR models were considered. We developed hybrid ARKA models that consist of a combination of QSAR descriptors and ARKA descriptors. These hybrid feature spaces were used to compute RASAR descriptors and develop ARKA-RASAR models. We used the same modeling strategies used to develop the previously reported QSAR and q-RASAR models for a fair comparison. Additionally, these modeling algorithms are straightforward, reproducible, and transferable. A multi-criteria decision-making statistical approach, the Sum of Ranking Differences (SRD), indicated that the ARKA-RASAR models are the best-performing models, considering training, test, and cross-validation statistics. The least significant difference procedure ensured that the SRD values were significantly different for most models, presenting an unbiased workflow. True external validation using a set of pesticide metabolites and predicting their early-stage acute fish toxicity using relevant ARKA-RASAR models was also carried out and yielded encouraging results. The promising results and the ease of computation of ARKA and RASAR descriptors using our tools suggest that the ARKA-RASAR modeling framework may be a potential choice for developing highly robust and predictive models for filling the gaps in environmental toxicity data.
当下急需不断探索快速、准确且高效的方法来填补商用化学品毒性数据的空白。因此,开发旨在生成更准确预测结果的简单且改进的建模策略变得至关重要。最近,有报道称定量跨读结构-活性关系(q-RASAR)建模可提高QSAR模型的外部预测能力。然而,与相应的QSAR模型相比,一些q-RASAR模型的交叉验证指标显示出折衷值。我们在此报告一种改进的q-RASAR工作流程,它与分组算术残差分析(ARKA)框架相结合。这种改进的工作流程(ARKA-RASAR)考虑了两个重要方面:不同QSAR描述符对不同实验响应范围的贡献,以及基于所选QSAR描述符和不同QSAR描述符对不同实验响应范围的贡献来识别相近同系物之间的相似性。已开发出一个简单、免费且用户友好的基于Java的工具Multiclass ARKA-v1.0来计算多类ARKA描述符。在本研究中,考虑了先前用于开发QSAR和q-RASAR模型的五个不同毒性数据集。我们开发了由QSAR描述符和ARKA描述符组合而成的混合ARKA模型。这些混合特征空间用于计算RASAR描述符并开发ARKA-RASAR模型。为了进行公平比较,我们使用了与先前报道的QSAR和q-RASAR模型相同的建模策略。此外,这些建模算法简单直接、可重复且可转移。一种多标准决策统计方法,即排名差异总和(SRD),表明考虑到训练、测试和交叉验证统计数据,ARKA-RASAR模型是性能最佳的模型。最小显著差异程序确保了大多数模型的SRD值存在显著差异,呈现出一种无偏差的工作流程。还使用一组农药代谢物进行了真实外部验证,并使用相关的ARKA-RASAR模型预测了它们的早期急性鱼类毒性,结果令人鼓舞。使用我们的工具计算ARKA和RASAR描述符的结果很有前景且计算简便,这表明ARKA-RASAR建模框架可能是开发高度稳健且具有预测性的模型以填补环境毒性数据空白的一个潜在选择。