利用无监督学习技术——主题建模挖掘 FDA 药物标签。

Mining FDA drug labels using an unsupervised learning technique--topic modeling.

机构信息

Department of Information Science, University of Arkansas at Little Rock, 2801 S, University Ave, Little Rock, AR 72204-1099, USA.

出版信息

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2105-12-S10-S11.

DOI:10.1186/1471-2105-12-S10-S11

PMID:22166012

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3236833/

Abstract

BACKGROUND

The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive.

METHOD

In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering "topics" that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs.

RESULTS

The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics.

CONCLUSIONS

The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.

摘要

背景

美国食品和药物管理局（FDA）批准的药品标签包含广泛的信息，从药物不良反应（ADR）到药物疗效、风险效益考量等。然而，用于描述这些信息的标签语言是自由文本，其中常包含模糊的语义描述，这给从标签文本中以一致和准确的方式检索有用信息带来了很大的挑战，以便在药物之间进行比较分析。因此，这项任务在很大程度上依赖于专家对全文的人工阅读，既费时又费力。

方法

本研究应用一种具有自然无监督学习性质的新型文本挖掘方法，即主题建模，用于药品标签，旨在发现将具有相似安全性问题和/或治疗用途的药物分组在一起的“主题”。本研究共使用了 794 份 FDA 批准的药品标签。首先，通过医疗保健监管活动专用医学词典（MedDRA）处理每个药品标签的三个标签部分（即盒装警告、警告和注意事项、不良反应），将每个标签的自由文本转换为标准的药物不良反应术语。接下来，应用具有潜在狄利克雷分配（LDA）的主题建模方法生成 100 个主题，每个主题都与一组基于概率分析分组在一起的药物相关联。最后，根据药物治疗用途和安全性数据的已知信息评估主题建模的效果。

结果

结果表明，根据主题分组的药物与相同的安全性问题和/或治疗用途相关，具有统计学意义（P<0.05）。所识别的主题具有独特的上下文，可以直接与特定的不良反应（例如肝损伤或肾损伤）或治疗应用（例如全身性抗感染药物）相关联。我们还能够通过主题识别出可能由特定药物引起的潜在不良反应。

结论

主题建模在 FDA 药品标签上的成功应用证明了其作为一种假设生成手段的潜力，可以推断出隐藏的概念关系，例如，在本研究中，药物安全性和治疗用途在生物医学文献研究中的关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7ed/3236833/c107b6361717/1471-2105-12-S10-S11-1.jpg

相似文献

Mining FDA drug labels using an unsupervised learning technique--topic modeling.利用无监督学习技术——主题建模挖掘 FDA 药物标签。

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2105-12-S10-S11.

Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA.使用 FDA 批准的药品标签和 MedDRA 研究严重药物不良反应。

BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):97. doi: 10.1186/s12859-019-2628-5.

Investigating drug repositioning opportunities in FDA drug labels through topic modeling.通过主题建模研究 FDA 药物标签中的药物重新定位机会。

BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S6. doi: 10.1186/1471-2105-13-S15-S6. Epub 2012 Sep 11.

Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels.基于机器学习的药物标签中药物不良反应的识别和基于规则的归一化。

BMC Bioinformatics. 2019 Dec 23;20(Suppl 21):707. doi: 10.1186/s12859-019-3195-5.

Review of FDA Amendments Act Section 921 Experience in Posting Data-mining Results from the FAERS Database.美国食品和药物管理局修正案第 921 节对 FAERS 数据库数据挖掘结果公布经验的审查。

Clin Ther. 2021 Feb;43(2):380-395. doi: 10.1016/j.clinthera.2020.12.011. Epub 2021 Jan 24.

A dataset of 200 structured product labels annotated for adverse drug reactions.一个标注了 200 个结构产品标签的药物不良反应数据集。

Sci Data. 2018 Jan 30;5:180001. doi: 10.1038/sdata.2018.1.

Mining FDA drug labels for medical conditions.从 FDA 药物标签中挖掘医疗条件信息。

BMC Med Inform Decis Mak. 2013 Apr 24;13:53. doi: 10.1186/1472-6947-13-53.

Evaluation of Natural Language Processing (NLP) systems to annotate drug product labeling with MedDRA terminology.评估自然语言处理 (NLP) 系统，以使用 MedDRA 术语对药品标签进行注释。

J Biomed Inform. 2018 Jul;83:73-86. doi: 10.1016/j.jbi.2018.05.019. Epub 2018 Jun 1.

Mining Adverse Events of Dietary Supplements from Product Labels by Topic Modeling.通过主题建模从产品标签中挖掘膳食补充剂的不良事件

Stud Health Technol Inform. 2017;245:614-618.

Ontology-based literature mining and class effect analysis of adverse drug reactions associated with neuropathy-inducing drugs.基于本体的与神经病变诱导药物相关的药物不良反应文献挖掘及类别效应分析

J Biomed Semantics. 2018 Jun 7;9(1):17. doi: 10.1186/s13326-018-0185-x.

引用本文的文献

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.使用人工智能将自由文本分类到预定义的部分：以药物标签文件为例的监管文件研究。

Chem Res Toxicol. 2023 Aug 21;36(8):1290-1299. doi: 10.1021/acs.chemrestox.3c00028. Epub 2023 Jul 24.

Psychosocial Needs of Gynecological Cancer Survivors: Mixed Methods Study.妇科癌症幸存者的社会心理需求：混合方法研究。

J Med Internet Res. 2022 Sep 20;24(9):e37757. doi: 10.2196/37757.

Comparison of the Erectile Dysfunction Drugs Sildenafil and Tadalafil Using Patient Medication Reviews: Topic Modeling Study.使用患者用药评价对勃起功能障碍药物西地那非和他达拉非进行比较：主题建模研究

JMIR Med Inform. 2022 Feb 28;10(2):e32689. doi: 10.2196/32689.

Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs.使用主题建模对电子健康记录进行无监督标注，以识别英国犬群中的疾病爆发。

PLoS One. 2021 Dec 9;16(12):e0260402. doi: 10.1371/journal.pone.0260402. eCollection 2021.

Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence.从PubMed挖掘人类群体中的早期生活风险和复原力因素及其影响：一种发现发育起源健康与疾病证据的机器学习方法

J Pers Med. 2021 Oct 22;11(11):1064. doi: 10.3390/jpm11111064.

Information Extraction From FDA Drug Labeling to Enhance Product-Specific Guidance Assessment Using Natural Language Processing.利用自然语言处理技术从美国食品药品监督管理局（FDA）药品标签中提取信息以加强特定产品指导评估

Front Res Metr Anal. 2021 Jun 10;6:670006. doi: 10.3389/frma.2021.670006. eCollection 2021.

A systematic review on literature-based discovery workflow.基于文献的发现工作流程的系统综述。

PeerJ Comput Sci. 2019 Nov 18;5:e235. doi: 10.7717/peerj-cs.235. eCollection 2019.

Patient Triage by Topic Modeling of Referral Letters: Feasibility Study.通过转诊信主题建模进行患者分诊：可行性研究

JMIR Med Inform. 2020 Nov 6;8(11):e21252. doi: 10.2196/21252.

Current trends in cancer immunotherapy: a literature-mining analysis.当前癌症免疫疗法的趋势：文献挖掘分析。

Cancer Immunol Immunother. 2020 Dec;69(12):2425-2439. doi: 10.1007/s00262-020-02630-8. Epub 2020 Jun 15.

Exploring Novel Computable Knowledge in Structured Drug Product Labels.探索结构化药品标签中的新型可计算知识。

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:403-412. eCollection 2020.

本文引用的文献

FDA-approved drug labeling for the study of drug-induced liver injury.FDA 批准用于药物性肝损伤研究的药物标签。

Drug Discov Today. 2011 Aug;16(15-16):697-703. doi: 10.1016/j.drudis.2011.05.007. Epub 2011 May 20.

Finding complex biological relationships in recent PubMed articles using Bio-LDA.利用 Bio-LDA 在最近的 PubMed 文章中发现复杂的生物学关系。

PLoS One. 2011 Mar 23;6(3):e17243. doi: 10.1371/journal.pone.0017243.

Inside the black box: current policies and concerns with the United States Food and Drug Administration's highest drug safety warning system.黑箱之内：美国食品和药物管理局最高药物安全警示系统的现行政策与关切

Curr Opin Anaesthesiol. 2010 Jun;23(3):423-7. doi: 10.1097/aco.0b013e328338c9f7.

Mapping adverse drug reactions in chemical space.在化学空间中绘制药物不良反应图谱。

J Med Chem. 2009 May 14;52(9):3103-7. doi: 10.1021/jm801546k.

Using the literature-based discovery paradigm to investigate drug mechanisms.运用基于文献的发现范式来研究药物作用机制。

AMIA Annu Symp Proc. 2007 Oct 11;2007:6-10.

Extraction of semantic biomedical relations from text using conditional random fields.使用条件随机场从文本中提取语义生物医学关系。

BMC Bioinformatics. 2008 Apr 23;9:207. doi: 10.1186/1471-2105-9-207.

Identifying biological concepts from a protein-related corpus with a probabilistic topic model.使用概率主题模型从蛋白质相关语料库中识别生物学概念。

BMC Bioinformatics. 2006 Feb 8;7:58. doi: 10.1186/1471-2105-7-58.

"Black box" 101: How the Food and Drug Administration evaluates, communicates, and manages drug benefit/risk.“黑匣子”101：美国食品药品监督管理局如何评估、沟通和管理药品的益处/风险。

J Allergy Clin Immunol. 2006 Jan;117(1):34-9. doi: 10.1016/j.jaci.2005.10.031.

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.《人类孟德尔遗传在线》（OMIM），一个关于人类基因和遗传疾病的知识库。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. doi: 10.1093/nar/gki033.

What is prescription labeling communicating to doctors about hepatotoxic drugs? A study of FDA approved product labeling.关于肝毒性药物，处方标签向医生传达了哪些信息？一项对美国食品药品监督管理局（FDA）批准的产品标签的研究。

Pharmacoepidemiol Drug Saf. 2004 Apr;13(4):201-6. doi: 10.1002/pds.856.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用无监督学习技术——主题建模挖掘 FDA 药物标签。

Mining FDA drug labels using an unsupervised learning technique--topic modeling.

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献