Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA.
Office of Surveillance and Epidemiology, FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA.
Exp Biol Med (Maywood). 2023 Nov;248(21):1937-1943. doi: 10.1177/15353702231220669. Epub 2024 Jan 2.
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
美国药品标签文件包含有关药物疗效和安全性的重要信息,是食品和药物管理局(FDA)药物审查员的重要监管资源。由于其体积庞大且包含自由文本,传统的文本挖掘分析在处理这些数据时遇到了挑战。人工智能(AI)在自然语言处理(NLP)方面的最新进展为从药品标签中识别关键信息提供了前所未有的机会,从而增强了安全性审查并为监管决策提供了支持。我们开发了 RxBERT,这是一种基于 Transformer 的双向编码器表示(BERT)模型,针对 FDA 人类处方药标签文件进行了预训练,可增强药品标签文件在研究和药物审查中的应用。RxBERT 源自 BioBERT,并在人类处方药标签文件上进行了进一步训练。RxBERT 在几个使用监管数据集的任务中得到了演示,包括参与国家技术研究所文本分析挑战赛数据集(NIST TAC 数据集)、FDA 不良药物事件评估数据集(ADEEval 数据集)以及将提交包中的文本分类到标签部分(美国药品标签数据集)。对于所有这些任务,RxBERT 在 TAC 和 ADEEval 分类中的 1 分得分均达到 86.5,对于美国药品标签数据集的预测准确率为 87%。总体而言,与其他 NLP 方法(如 BERT、BioBERT 等)相比,RxBERT 的表现同样出色或更具优势。总之,我们开发了 RxBERT,这是一种针对药品标签的基于转换器的模型,其性能优于原始的 BERT 模型。RxBERT 有可能被用于协助研究科学家和 FDA 审查员更好地处理和利用药品标签信息,以提高药物的有效性和安全性,造福公众健康。这项概念验证研究还展示了一种潜在的途径,可以针对内部应用的敏感监管文件定制大型语言模型(LLM)。