Suppr超能文献

基于BERT的药品标签文档自然语言处理:药物性肝损伤风险分类的案例研究

BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.

作者信息

Wu Yue, Liu Zhichao, Wu Leihong, Chen Minjun, Tong Weida

机构信息

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, AR, United States.

出版信息

Front Artif Intell. 2021 Dec 6;4:729834. doi: 10.3389/frai.2021.729834. eCollection 2021.

Abstract

The United States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based model. FDA drug labeling documents were used as a representative regulatory data source to classify drug-induced liver injury (DILI) risk by employing the state-of-the-art language model BERT. The resulting NLP-DILI classification model was statistically validated with both internal and external validation procedures and applied to the labeling data from the European Medicines Agency (EMA) for cross-agency application. The NLP-DILI model developed using FDA labeling documents and evaluated by cross-validations in this study showed remarkable performance in DILI classification with a recall of 1 and a precision of 0.78. When cross-agency data were used to validate the model, the performance remained comparable, demonstrating that the model was portable across agencies. Results also suggested that the model was able to capture the semantic meanings of sentences in drug labeling. Deep learning-based NLP models performed well in DILI classification of drug labeling documents and learned the meanings of complex text in drug labeling. This proof-of-concept work demonstrated that using AI technologies to assist regulatory activities is a promising approach to modernize and advance regulatory science.

摘要

美国食品药品监督管理局(FDA)监管范围广泛的消费品,这些产品约占美国市场的25%。FDA的监管活动通常涉及大量文件的制作和阅读,既耗时又耗力。为了支持FDA的监管科学,我们评估了基于人工智能(AI)的自然语言处理(NLP)技术对监管文件进行文本分类,并将基于深度学习的模型与传统的基于关键词的模型进行了比较。FDA药品标签文件被用作代表性的监管数据源,通过使用最先进的语言模型BERT对药物性肝损伤(DILI)风险进行分类。所得的NLP-DILI分类模型通过内部和外部验证程序进行了统计学验证,并应用于欧洲药品管理局(EMA)的标签数据以进行跨机构应用。本研究中使用FDA标签文件开发并通过交叉验证评估的NLP-DILI模型在DILI分类中表现出色,召回率为1,精确率为0.78。当使用跨机构数据验证该模型时,性能仍然相当,表明该模型可在不同机构间移植。结果还表明,该模型能够捕捉药品标签中句子的语义。基于深度学习的NLP模型在药品标签文件的DILI分类中表现良好,并能够理解药品标签中复杂文本的含义。这项概念验证工作表明,使用人工智能技术协助监管活动是使监管科学现代化和取得进展的一种有前景的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2aa2/8685544/f1a9ed202696/frai-04-729834-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验