Suppr超能文献

OnSIDES数据库:使用自然语言处理模型从药品标签中提取药品不良事件。

OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models.

作者信息

Tanaka Yutaro, Chen Hsin Yi, Belloni Pietro, Gisladottir Undina, Kefeli Jenna, Patterson Jason, Srinivasan Apoorva, Zietz Michael, Sirdeshmukh Gaurav, Berkowitz Jacob, LaRow Brown Kathleen, Tatonetti Nicholas P

机构信息

Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University, New York, NY 10032, USA; Department of Applied Physics and Applied Mathematics, Fu Foundation School of Engineering and Applied Sciences, Columbia University, New York, NY 10027, USA.

Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University, New York, NY 10032, USA.

出版信息

Med. 2025 Jul 11;6(7):100642. doi: 10.1016/j.medj.2025.100642. Epub 2025 Apr 2.

Abstract

BACKGROUND

Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting our capacity to study drug safety on a broader, systematic scale. Recent advances in natural language processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text.

METHODS

We fine-tune a PubMedBERT model to extract ADE terms from text in FDA Structured Product Labels for prescription drugs. Here, we present OnSIDES (on-label side effects resource), a compiled, machine-friendly database of drug-ADE pairs generated with this method. We further utilize this method to extract pediatric-specific ADEs, serious ADEs from labels' "Boxed Warnings" section, and ADEs from drug labels of other major nations-the UK, the European Union, and Japan-to build a complementary OnSIDES-INTL database. To present OnSIDES' potential applications, we leverage the database to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures.

FINDINGS

We achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels' "Adverse Reactions" section. OnSIDES contains over 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels.

CONCLUSIONS

OnSIDES can be used as a comprehensive resource to study and enhance drug safety.

FUNDING

R35GM131905 to N.P.T.; T32GM145440 to H.Y.C.; and T15LM007079 to U.G., M.Z., and K.L.B.

摘要

背景

药物不良事件(ADEs)是美国第四大死亡原因,每年因医疗成本增加而造成数十亿美元的损失。然而,几乎没有机器可读的ADEs数据库,这限制了我们在更广泛、系统的规模上研究药物安全性的能力。自然语言处理方法(如BERT模型)的最新进展为从非结构化生物医学文本中准确提取相关信息提供了机会。

方法

我们对PubMedBERT模型进行微调,以从美国食品药品监督管理局(FDA)处方药结构化产品标签的文本中提取ADE术语。在此,我们展示了OnSIDES(标签上的副作用资源),这是一个通过此方法生成的、机器友好的药物 - ADE对汇编数据库。我们进一步利用此方法提取儿科特定的ADEs、标签“黑框警告”部分中的严重ADEs,以及来自其他主要国家(英国、欧盟和日本)药品标签的ADEs,以构建一个补充性的OnSIDES - INTL数据库。为了展示OnSIDES的潜在应用,我们利用该数据库预测新的药物靶点和适应症,分析不同药物类别中ADEs的富集情况,并从化合物结构预测新的ADEs。

研究结果

在从标签的“不良反应”部分提取ADEs时,我们实现了F1分数为0.90、曲线下面积(AUROC)为0.92和精确率均值(AUPR)为0.95。OnSIDES包含从47,211个标签中提取的3233种独特药物成分组合的超过360万个药物 - ADE对。

结论

OnSIDES可作为研究和提高药物安全性的综合资源。

资金来源

N.P.T.获得R35GM131905资助;H.Y.C.获得T32GM145440资助;U.G.、M.Z.和K.L.B.获得T15LM007079资助。

相似文献

9
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
10
Direct-acting antivirals for chronic hepatitis C.用于慢性丙型肝炎的直接作用抗病毒药物。
Cochrane Database Syst Rev. 2017 Sep 18;9(9):CD012143. doi: 10.1002/14651858.CD012143.pub3.

本文引用的文献

2
Clinically relevant pretraining is all you need.你所需要的只是具有临床相关性的预训练。
J Am Med Inform Assoc. 2021 Aug 13;28(9):1970-1976. doi: 10.1093/jamia/ocab086.
4
Towards reproducible computational drug discovery.迈向可重复的计算药物发现。
J Cheminform. 2020 Jan 28;12(1):9. doi: 10.1186/s13321-020-0408-x.
8
Toward understanding the origin and evolution of cellular organisms.为了理解细胞生物的起源和进化。
Protein Sci. 2019 Nov;28(11):1947-1951. doi: 10.1002/pro.3715. Epub 2019 Sep 9.
9
Analyzing Learned Molecular Representations for Property Prediction.分析用于性质预测的学习分子表示。
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. doi: 10.1021/acs.jcim.9b00237. Epub 2019 Aug 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验