RxBERT：利用人工智能语言模型增强药物标签文本挖掘和分析。

RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling.

机构信息

Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA.

Office of Surveillance and Epidemiology, FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA.

出版信息

Exp Biol Med (Maywood). 2023 Nov;248(21):1937-1943. doi: 10.1177/15353702231220669. Epub 2024 Jan 2.

DOI:10.1177/15353702231220669

PMID:38166420

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10798181/

Abstract

The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.

摘要

美国药品标签文件包含有关药物疗效和安全性的重要信息，是食品和药物管理局（FDA）药物审查员的重要监管资源。由于其体积庞大且包含自由文本，传统的文本挖掘分析在处理这些数据时遇到了挑战。人工智能（AI）在自然语言处理（NLP）方面的最新进展为从药品标签中识别关键信息提供了前所未有的机会，从而增强了安全性审查并为监管决策提供了支持。我们开发了 RxBERT，这是一种基于 Transformer 的双向编码器表示（BERT）模型，针对 FDA 人类处方药标签文件进行了预训练，可增强药品标签文件在研究和药物审查中的应用。RxBERT 源自 BioBERT，并在人类处方药标签文件上进行了进一步训练。RxBERT 在几个使用监管数据集的任务中得到了演示，包括参与国家技术研究所文本分析挑战赛数据集（NIST TAC 数据集）、FDA 不良药物事件评估数据集（ADEEval 数据集）以及将提交包中的文本分类到标签部分（美国药品标签数据集）。对于所有这些任务，RxBERT 在 TAC 和 ADEEval 分类中的 1 分得分均达到 86.5，对于美国药品标签数据集的预测准确率为 87%。总体而言，与其他 NLP 方法（如 BERT、BioBERT 等）相比，RxBERT 的表现同样出色或更具优势。总之，我们开发了 RxBERT，这是一种针对药品标签的基于转换器的模型，其性能优于原始的 BERT 模型。RxBERT 有可能被用于协助研究科学家和 FDA 审查员更好地处理和利用药品标签信息，以提高药物的有效性和安全性，造福公众健康。这项概念验证研究还展示了一种潜在的途径，可以针对内部应用的敏感监管文件定制大型语言模型（LLM）。

相似文献

RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling.

Exp Biol Med (Maywood). 2023 Nov;248(21):1937-1943. doi: 10.1177/15353702231220669. Epub 2024 Jan 2.

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.

Chem Res Toxicol. 2023 Aug 21;36(8):1290-1299. doi: 10.1021/acs.chemrestox.3c00028. Epub 2023 Jul 24.

BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.

Front Artif Intell. 2021 Dec 6;4:729834. doi: 10.3389/frai.2021.729834. eCollection 2021.

Text summarization with ChatGPT for drug labeling documents.

Drug Discov Today. 2024 Jun;29(6):104018. doi: 10.1016/j.drudis.2024.104018. Epub 2024 May 7.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

PharmBERT: a domain-specific BERT model for drug labels.

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad226.

Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques.

Drug Saf. 2023 Aug;46(8):781-795. doi: 10.1007/s40264-023-01323-2. Epub 2023 Jun 17.

BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices.

Front Public Health. 2024 Apr 23;12:1392180. doi: 10.3389/fpubh.2024.1392180. eCollection 2024.

Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.

JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.

Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets.

J Biomed Inform. 2024 Apr;152:104621. doi: 10.1016/j.jbi.2024.104621. Epub 2024 Mar 5.

引用本文的文献

Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study.

J Med Internet Res. 2025 Mar 10;27:e65651. doi: 10.2196/65651.

The applications and advances of artificial intelligence in drug regulation: A global perspective.

Acta Pharm Sin B. 2025 Jan;15(1):1-14. doi: 10.1016/j.apsb.2024.11.006. Epub 2024 Nov 13.

Leveraging FDA Labeling Documents and Large Language Model to Enhance Annotation, Profiling, and Classification of Drug Adverse Events with AskFDALabel.

Drug Saf. 2025 Jun;48(6):655-665. doi: 10.1007/s40264-025-01520-1. Epub 2025 Feb 20.

Description and Validation of a Novel AI Tool, LabelComp, for the Identification of Adverse Event Changes in FDA Labeling.

Drug Saf. 2024 Dec;47(12):1265-1274. doi: 10.1007/s40264-024-01468-8. Epub 2024 Jul 31.

Integrating artificial intelligence with bioinformatics promotes public health.

Exp Biol Med (Maywood). 2023 Nov;248(21):1905-1907. doi: 10.1177/15353702231223575.

本文引用的文献

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.

Chem Res Toxicol. 2023 Aug 21;36(8):1290-1299. doi: 10.1021/acs.chemrestox.3c00028. Epub 2023 Jul 24.

PharmBERT: a domain-specific BERT model for drug labels.

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad226.

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.

Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.

[Artificial intelligence: How will ChatGPT and other AI applications change our everyday medical practice?].

Med Klin Intensivmed Notfmed. 2023 Jun;118(5):366-371. doi: 10.1007/s00063-023-01019-6. Epub 2023 Apr 28.

Artificial intelligence and real-world data for drug and food safety - A regulatory science perspective.

Regul Toxicol Pharmacol. 2023 May;140:105388. doi: 10.1016/j.yrtph.2023.105388. Epub 2023 Apr 13.

COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter.

Front Artif Intell. 2023 Mar 14;6:1023281. doi: 10.3389/frai.2023.1023281. eCollection 2023.

Development of benchmark datasets for text mining and sentiment analysis to accelerate regulatory literature review.

Regul Toxicol Pharmacol. 2023 Jan;137:105287. doi: 10.1016/j.yrtph.2022.105287. Epub 2022 Nov 11.

BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.

Front Artif Intell. 2021 Dec 6;4:729834. doi: 10.3389/frai.2021.729834. eCollection 2021.

Detecting formal thought disorder by deep contextualized word representations.

Psychiatry Res. 2021 Oct;304:114135. doi: 10.1016/j.psychres.2021.114135. Epub 2021 Jul 24.

FDALabel for drug repurposing studies and beyond.

Nat Biotechnol. 2020 Dec;38(12):1378-1379. doi: 10.1038/s41587-020-00751-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

RxBERT：利用人工智能语言模型增强药物标签文本挖掘和分析。

RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献