ARKA：一种用于机器学习分类建模、风险评估和填补稀疏环境毒性数据的数据空白的降维框架。

ARKA: a framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data.

机构信息

Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.

出版信息

Environ Sci Process Impacts. 2024 Jun 19;26(6):991-1007. doi: 10.1039/d4em00173g.

DOI:10.1039/d4em00173g

PMID:38743054

Abstract

Due to the lack of experimental toxicity data for environmental chemicals, there arises a need to fill data gaps by approaches. One of the most commonly used approaches for toxicity assessment of small datasets is the Quantitative Structure-Activity Relationship (QSAR), which generates predictive models for the efficient prediction of query compounds. However, the reliability of the predictions from QSARs derived from small datasets is often questionable from a statistical point of view. This is due to the presence of a larger number of descriptors as compared to the number of training compounds, which reduces the degree of freedom of the developed model. To reduce the overall prediction error for a particular QSAR model, we have proposed here the computation of the novel Arithmetic Residuals in -groups Analysis (ARKA) descriptors. We have reduced the number of modeling descriptors in a supervised manner by partitioning them into classes ( = 2 here) depending on the higher mean normalized values of the descriptors to a particular response class, thus preventing the loss of chemical information. A scatter plot of the data points using the values of two ARKA descriptors (ARKA_2 ARKA_1) can potentially identify activity cliffs, less confident data points, and less modelable data points. We have used here five representative environmentally relevant endpoints (skin sensitization, earthworm toxicity, milk/plasma partitioning, algal toxicity, and rodent carcinogenicity of hazardous chemicals) with graded responses to which the ARKA framework was applied for classification modeling. On comparing the performance of the models generated using conventional QSAR descriptors and the ARKA descriptors, the prediction quality of the models derived from ARKA descriptors was found, based on multiple graded-data validation metrics-derived decision criteria, much better than the models derived from QSAR descriptors signifying the potential of ARKA descriptors in ecotoxicological classification modeling of small data sets. Additionally, this holds true for the Read-Across approach as well, since the Read-Across predictions using ARKA descriptors supersede the predictions generated from QSAR descriptors. For the ease of users, a Java-based expert system has been developed that computes the ARKA descriptors from the input of QSAR descriptors.

摘要

由于缺乏环境化学物质的实验毒性数据，因此需要通过多种方法来填补数据空白。对于小数据集的毒性评估，最常用的方法之一是定量构效关系（QSAR），它为查询化合物的有效预测生成预测模型。然而，从统计学角度来看，从小数据集得出的 QSAR 预测的可靠性往往值得怀疑。这是因为与训练化合物的数量相比，描述符的数量更多，从而降低了开发模型的自由度。为了降低特定 QSAR 模型的整体预测误差，我们在这里提出了计算新颖的分组算术残差分析（ARKA）描述符的方法。我们通过将描述符分为 2 个类（此处为 2），根据描述符到特定响应类的更高平均归一化值，以监督方式减少建模描述符的数量，从而防止化学信息的丢失。使用两个 ARKA 描述符（ARKA_2 和 ARKA_1）的值绘制数据点的散点图，可以潜在地识别活性悬崖、置信度较低的数据点和不易建模的数据点。我们在这里使用了五个具有分级响应的具有代表性的环境相关终点（皮肤致敏性、蚯蚓毒性、牛奶/血浆分配、藻类毒性和危险化学品的啮齿动物致癌性），应用 ARKA 框架进行分类建模。在比较使用传统 QSAR 描述符和 ARKA 描述符生成的模型的性能时，根据多个分级数据验证指标衍生的决策标准，发现基于 ARKA 描述符的模型的预测质量比基于 QSAR 描述符的模型要好得多，这表明 ARKA 描述符在小数据集的生态毒理学分类建模中的潜力。此外，对于 Read-Across 方法也是如此，因为使用 ARKA 描述符的 Read-Across 预测优于基于 QSAR 描述符的预测。为了方便用户，我们开发了一个基于 Java 的专家系统，它可以从 QSAR 描述符的输入中计算 ARKA 描述符。

相似文献

ARKA: a framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data.

Environ Sci Process Impacts. 2024 Jun 19;26(6):991-1007. doi: 10.1039/d4em00173g.

The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset.

Sci Rep. 2024 Sep 6;14(1):20812. doi: 10.1038/s41598-024-71892-4.

Prediction-Inspired Intelligent Training for the Development of Classification Read-across Structure-Activity Relationship (c-RASAR) Models for Organic Skin Sensitizers: Assessment of Classification Error Rate from Novel Similarity Coefficients.

Chem Res Toxicol. 2023 Sep 18;36(9):1518-1531. doi: 10.1021/acs.chemrestox.3c00155. Epub 2023 Aug 16.

Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods.

Ecotoxicol Environ Saf. 2019 Dec 30;186:109822. doi: 10.1016/j.ecoenv.2019.109822. Epub 2019 Oct 18.

Development of a read-across-derived classification model for the predictions of mutagenicity data and its comparison with traditional QSAR models and expert systems.

Toxicology. 2023 Dec;500:153676. doi: 10.1016/j.tox.2023.153676. Epub 2023 Nov 21.

In Silico Study of In Vitro GPCR Assays by QSAR Modeling.

Methods Mol Biol. 2016;1425:361-81. doi: 10.1007/978-1-4939-3609-0_16.

Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.

J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.

Prediction of rodent carcinogenic potential of naturally occurring chemicals in the human diet using high-throughput QSAR predictive modeling.

Toxicol Appl Pharmacol. 2007 Jul 1;222(1):1-16. doi: 10.1016/j.taap.2007.03.012. Epub 2007 Mar 24.

Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure.

Chem Res Toxicol. 2017 Nov 20;30(11):2046-2059. doi: 10.1021/acs.chemrestox.7b00084. Epub 2017 Oct 9.

Use of in vitro HTS-derived concentration-response data as biological descriptors improves the accuracy of QSAR models of in vivo toxicity.

Environ Health Perspect. 2011 Mar;119(3):364-70. doi: 10.1289/ehp.1002476. Epub 2010 Oct 27.

引用本文的文献

Evaluating the Vascular Risk of PFCs: An Integrated XGBoost-Driven Structure-Activity Prediction and Experimental Validation Study.

Environ Health (Wash). 2025 Apr 29;3(7):795-806. doi: 10.1021/envhealth.5c00014. eCollection 2025 Jul 18.

Prediction of Global Warming Potential for Gases Based on Group Contribution Method and Chemical Activity Descriptor.

ACS Omega. 2025 May 29;10(22):22508-22520. doi: 10.1021/acsomega.4c09710. eCollection 2025 Jun 10.

Risk assessment of industrial chemicals towards salmon species amalgamating QSAR, q-RASAR, and ARKA framework.

Toxicol Rep. 2025 Apr 5;14:102017. doi: 10.1016/j.toxrep.2025.102017. eCollection 2025 Jun.

Modeling and Interpretability Study of the Structure-Activity Relationship for Multigeneration EGFR Inhibitors.

ACS Omega. 2025 Mar 14;10(11):11176-11187. doi: 10.1021/acsomega.4c10464. eCollection 2025 Mar 25.

Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs.

Sci Rep. 2025 Jan 4;15(1):808. doi: 10.1038/s41598-024-85063-y.

Predictive Modeling and Drug Repurposing for Type-II Diabetes.

ACS Med Chem Lett. 2024 Oct 2;15(11):1907-1917. doi: 10.1021/acsmedchemlett.4c00358. eCollection 2024 Nov 14.

Unveiling the interspecies correlation and sensitivity factor analysis of rat and mouse acute oral toxicity of antimicrobial agents: first QSTR and QTTR Modeling report.

Toxicol Res (Camb). 2024 Nov 16;13(6):tfae191. doi: 10.1093/toxres/tfae191. eCollection 2024 Dec.

The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset.

Sci Rep. 2024 Sep 6;14(1):20812. doi: 10.1038/s41598-024-71892-4.

Crit Rev Toxicol. 2024 Oct;54(9):659-684. doi: 10.1080/10408444.2024.2386260. Epub 2024 Sep 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ARKA：一种用于机器学习分类建模、风险评估和填补稀疏环境毒性数据的数据空白的降维框架。

ARKA: a framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献