使用人工智能将自由文本分类到预定义的部分：以药物标签文件为例的监管文件研究。

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.

机构信息

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079 United States.

出版信息

Chem Res Toxicol. 2023 Aug 21;36(8):1290-1299. doi: 10.1021/acs.chemrestox.3c00028. Epub 2023 Jul 24.

DOI:10.1021/acs.chemrestox.3c00028

PMID:37487037

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10445280/

Abstract

The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.

摘要

美国食品和药物管理局 (FDA) 的监管程序通常涉及几位审查员，他们专注于与其各自审查领域相关的信息集。因此，向监管机构提供提交包的制造商被指示使用一种能够轻松分配、检索和审查信息的结构来组织内容。然而，这种做法并不总是正确遵循的；因此，一些文件结构不合理，相似的信息分散在不同的部分，阻碍了整体有效地获取和审查所有相关数据。为了改善这种常见情况，我们评估了一种基于人工智能 (AI) 的自然语言处理 (NLP) 方法，称为来自转换器的双向编码器表示 (BERT)，以自动将自由文本信息分类为标准化部分，支持对药物安全性和疗效进行全面审查。具体来说，本研究使用了 FDA 标签文件作为概念验证，其中使用由医师标签规则 (PLR) 定义的标签部分结构对模型开发中的标签进行分类。随后，该模型在结构良好的标签文件（即基于 PLR 的标签）和结构较差或不同的文件（即非 PLR 和产品特性摘要 [SmPC] 标签）的文本上进行了评估。在训练过程中，该模型在二进制和多类任务中分别产生了 96%和 88%的准确率。对于二进制模型，观察到的 PLR、非-PLR 和 SmPC 测试数据集的测试准确率分别为 95%、88%和 88%，对于多类模型分别为 82%、73%和 68%。我们的研究表明，使用 AI 语言模型将自由文本自动分类为标准化部分可能是一种先进的监管科学方法，可通过有效处理非格式化文件来支持审查过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d7f/10445280/89836a8d0bc8/tx3c00028_0001.jpg

相似文献

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.

Chem Res Toxicol. 2023 Aug 21;36(8):1290-1299. doi: 10.1021/acs.chemrestox.3c00028. Epub 2023 Jul 24.

RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling.

Exp Biol Med (Maywood). 2023 Nov;248(21):1937-1943. doi: 10.1177/15353702231220669. Epub 2024 Jan 2.

BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.

Front Artif Intell. 2021 Dec 6;4:729834. doi: 10.3389/frai.2021.729834. eCollection 2021.

Text summarization with ChatGPT for drug labeling documents.

Drug Discov Today. 2024 Jun;29(6):104018. doi: 10.1016/j.drudis.2024.104018. Epub 2024 May 7.

Fine-tuning BERT for automatic ADME semantic labeling in FDA drug labeling to enhance product-specific guidance assessment.

J Biomed Inform. 2023 Feb;138:104285. doi: 10.1016/j.jbi.2023.104285. Epub 2023 Jan 9.

Overdosage Section in US and EU Labeling.

Ther Innov Regul Sci. 2024 Sep;58(5):946-952. doi: 10.1007/s43441-024-00673-y. Epub 2024 Jun 17.

PharmBERT: a domain-specific BERT model for drug labels.

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad226.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Mining FDA drug labels using an unsupervised learning technique--topic modeling.

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2105-12-S10-S11.

Information Extraction From FDA Drug Labeling to Enhance Product-Specific Guidance Assessment Using Natural Language Processing.

Front Res Metr Anal. 2021 Jun 10;6:670006. doi: 10.3389/frma.2021.670006. eCollection 2021.

引用本文的文献

RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling.

Exp Biol Med (Maywood). 2023 Nov;248(21):1937-1943. doi: 10.1177/15353702231220669. Epub 2024 Jan 2.

Transforming clinical trials: the emerging roles of large language models.

Transl Clin Pharmacol. 2023 Sep;31(3):131-138. doi: 10.12793/tcp.2023.31.e16. Epub 2023 Sep 19.

本文引用的文献

Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA.

BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):97. doi: 10.1186/s12859-019-2628-5.

Mining FDA drug labels using an unsupervised learning technique--topic modeling.

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2105-12-S10-S11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用人工智能将自由文本分类到预定义的部分：以药物标签文件为例的监管文件研究。

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献