Suppr超能文献

GraphATC:通过原子级图学习推进多层次多标签解剖治疗化学分类

GraphATC: advancing multilevel and multi-label anatomical therapeutic chemical classification via atom-level graph learning.

作者信息

Zhang Wengyu, Tian Qi, Cao Yi, Fan Wenqi, Jiang Dongmei, Wang Yaowei, Li Qing, Wei Xiao-Yong

机构信息

Department of Computer Science, Sichuan University, Chengdu 610065, China.

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong.

出版信息

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf194.

Abstract

The accurate categorization of compounds within the anatomical therapeutic chemical (ATC) system is fundamental for drug development and fundamental research. Although this area has garnered significant research focus for over a decade, the majority of prior studies have concentrated solely on the Level 1 labels defined by the World Health Organization (WHO), neglecting the labels of the remaining four levels. This narrow focus fails to address the true nature of the task as a multilevel, multi-label classification challenge. Moreover, existing benchmarks like Chen-2012 and ATC-SMILES have become outdated, lacking the incorporation of new drugs or updated properties of existing ones that have emerged in recent years and have been integrated into the WHO ATC system. To tackle these shortcomings, we present a comprehensive approach in this paper. Firstly, we systematically cleanse and enhance the drug dataset, expanding it to encompass all five levels through a rigorous cross-resource validation process involving KEGG, PubChem, ChEMBL, ChemSpider, and ChemicalBook. This effort culminates in the creation of a novel benchmark termed ATC-GRAPH. Secondly, we extend the classification task to encompass Level 2 and introduce graph-based learning techniques to provide more accurate representations of drug molecular structures. This approach not only facilitates the modeling of Polymers, Macromolecules, and Multi-Component drugs more precisely but also enhances the overall fidelity of the classification process. The efficacy of our proposed framework is validated through extensive experiments, establishing a new state-of-the-art methodology. To facilitate the replication of this study, we have made the benchmark dataset, source code, and web server openly accessible.

摘要

在解剖学治疗化学(ATC)系统中对化合物进行准确分类,对于药物开发和基础研究至关重要。尽管该领域在过去十多年里一直备受研究关注,但此前的大多数研究都仅集中于世界卫生组织(WHO)定义的一级标签,而忽略了其余四个级别的标签。这种狭隘的关注点未能将该任务视为多层次、多标签分类挑战的本质。此外,诸如Chen - 2012和ATC - SMILES等现有基准已过时,未纳入近年来出现并已整合到WHO ATC系统中的新药或现有药物的更新特性。为解决这些不足,我们在本文中提出了一种全面的方法。首先,我们系统地清理和增强药物数据集,通过涉及KEGG、PubChem、ChEMBL、ChemSpider和ChemicalBook的严格跨资源验证过程,将其扩展到涵盖所有五个级别。这一努力最终创建了一个名为ATC - GRAPH的新基准。其次,我们将分类任务扩展到涵盖二级标签,并引入基于图的学习技术,以更准确地表示药物分子结构。这种方法不仅有助于更精确地对聚合物、大分子和多组分药物进行建模,还提高了分类过程的整体保真度。我们提出的框架的有效性通过广泛实验得到验证,确立了一种新的最先进方法。为便于重复本研究,我们已将基准数据集、源代码和网络服务器公开提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a14d/12031726/5f601276ba65/bbaf194f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验