Suppr超能文献

解锁聚类和分类方法的潜力:探索有监督和无监督的化学相似性。

Unlocking the Potential of Clustering and Classification Approaches: Navigating Supervised and Unsupervised Chemical Similarity.

机构信息

Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.

出版信息

Environ Health Perspect. 2024 Aug;132(8):85002. doi: 10.1289/EHP14001. Epub 2024 Aug 6.

Abstract

BACKGROUND

The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications.

OBJECTIVES

This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses.

DISCUSSION

Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.

摘要

背景

近年来,毒理学领域取得了重大进展,特别是采用新的方法(NAMs)来理解和预测化学毒性。基于类的方法,如聚类和分类,是 NAMs 开发和应用的关键,有助于在无需额外实验室工作的情况下,理解与化学品组相关的危害和风险问题。计算化学、数据生成和可用性以及机器学习算法的进步代表了持续改进这些技术的重要机会,以优化其在特定监管和研究目的中的应用。然而,由于其复杂性,深刻理解和仔细选择对于将适当的方法与预期应用相匹配至关重要。

目的

本评论旨在通过阐明化学相似性(结构和生物学)在聚类和分类方法(CCAs)中的关键作用,加深对基于类的方法的理解。它解决了通用终点不可知相似性(通常需要无监督分析)和特定终点相似性(需要监督学习)之间的二分法。目标是突出这些方法的细微差别、它们的应用以及常见的误用。

讨论

在涉及 CCAs 的毒理学研究中,理解相似性是至关重要的。这些方法的有效性取决于相似性的正确定义和度量,这取决于研究的背景和目标。这种选择受到化学结构表示方式以及生物活性的相应标签的影响。无监督聚类和有监督分类方法之间的区别至关重要,需要使用终点不可知的相似性定义与终点特定的相似性定义。这些方法的单独使用或组合需要仔细考虑,以防止偏差并确保与研究目标相关。无监督方法使用终点不可知的相似性度量来揭示一般的结构模式和关系,有助于生成假说并促进对数据集的探索,而无需预定义的标签或明确的指导。相反,有监督技术需要终点特定的相似性来将化学品分组到预定义的类别中或训练分类模型,从而可以对新的化学品进行准确预测。当无监督方法应用于终点特定的情况时,例如在类推中选择类似物,可能会出现误用,导致错误的结论。本评论提供了对相似性的重要性及其在有监督分类和无监督聚类方法中的作用的深入了解。https://doi.org/10.1289/EHP14001。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0982/11302584/f4d4f07d3ce3/ehp14001_f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验