Tanaka Yutaro, Chen Hsin Yi, Belloni Pietro, Gisladottir Undina, Kefeli Jenna, Patterson Jason, Srinivasan Apoorva, Zietz Michael, Sirdeshmukh Gaurav, Berkowitz Jacob, LaRow Brown Kathleen, Tatonetti Nicholas P
Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University, New York, NY 10032, USA; Department of Applied Physics and Applied Mathematics, Fu Foundation School of Engineering and Applied Sciences, Columbia University, New York, NY 10027, USA.
Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University, New York, NY 10032, USA.
Med. 2025 Jul 11;6(7):100642. doi: 10.1016/j.medj.2025.100642. Epub 2025 Apr 2.
Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting our capacity to study drug safety on a broader, systematic scale. Recent advances in natural language processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text.
We fine-tune a PubMedBERT model to extract ADE terms from text in FDA Structured Product Labels for prescription drugs. Here, we present OnSIDES (on-label side effects resource), a compiled, machine-friendly database of drug-ADE pairs generated with this method. We further utilize this method to extract pediatric-specific ADEs, serious ADEs from labels' "Boxed Warnings" section, and ADEs from drug labels of other major nations-the UK, the European Union, and Japan-to build a complementary OnSIDES-INTL database. To present OnSIDES' potential applications, we leverage the database to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures.
We achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels' "Adverse Reactions" section. OnSIDES contains over 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels.
OnSIDES can be used as a comprehensive resource to study and enhance drug safety.
R35GM131905 to N.P.T.; T32GM145440 to H.Y.C.; and T15LM007079 to U.G., M.Z., and K.L.B.
药物不良事件(ADEs)是美国第四大死亡原因,每年因医疗成本增加而造成数十亿美元的损失。然而,几乎没有机器可读的ADEs数据库,这限制了我们在更广泛、系统的规模上研究药物安全性的能力。自然语言处理方法(如BERT模型)的最新进展为从非结构化生物医学文本中准确提取相关信息提供了机会。
我们对PubMedBERT模型进行微调,以从美国食品药品监督管理局(FDA)处方药结构化产品标签的文本中提取ADE术语。在此,我们展示了OnSIDES(标签上的副作用资源),这是一个通过此方法生成的、机器友好的药物 - ADE对汇编数据库。我们进一步利用此方法提取儿科特定的ADEs、标签“黑框警告”部分中的严重ADEs,以及来自其他主要国家(英国、欧盟和日本)药品标签的ADEs,以构建一个补充性的OnSIDES - INTL数据库。为了展示OnSIDES的潜在应用,我们利用该数据库预测新的药物靶点和适应症,分析不同药物类别中ADEs的富集情况,并从化合物结构预测新的ADEs。
在从标签的“不良反应”部分提取ADEs时,我们实现了F1分数为0.90、曲线下面积(AUROC)为0.92和精确率均值(AUPR)为0.95。OnSIDES包含从47,211个标签中提取的3233种独特药物成分组合的超过360万个药物 - ADE对。
OnSIDES可作为研究和提高药物安全性的综合资源。
N.P.T.获得R35GM131905资助;H.Y.C.获得T32GM145440资助;U.G.、M.Z.和K.L.B.获得T15LM007079资助。