Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079 United States.
Chem Res Toxicol. 2023 Aug 21;36(8):1290-1299. doi: 10.1021/acs.chemrestox.3c00028. Epub 2023 Jul 24.
The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.
美国食品和药物管理局 (FDA) 的监管程序通常涉及几位审查员,他们专注于与其各自审查领域相关的信息集。因此,向监管机构提供提交包的制造商被指示使用一种能够轻松分配、检索和审查信息的结构来组织内容。然而,这种做法并不总是正确遵循的;因此,一些文件结构不合理,相似的信息分散在不同的部分,阻碍了整体有效地获取和审查所有相关数据。为了改善这种常见情况,我们评估了一种基于人工智能 (AI) 的自然语言处理 (NLP) 方法,称为来自转换器的双向编码器表示 (BERT),以自动将自由文本信息分类为标准化部分,支持对药物安全性和疗效进行全面审查。具体来说,本研究使用了 FDA 标签文件作为概念验证,其中使用由医师标签规则 (PLR) 定义的标签部分结构对模型开发中的标签进行分类。随后,该模型在结构良好的标签文件(即基于 PLR 的标签)和结构较差或不同的文件(即非 PLR 和产品特性摘要 [SmPC] 标签)的文本上进行了评估。在训练过程中,该模型在二进制和多类任务中分别产生了 96%和 88%的准确率。对于二进制模型,观察到的 PLR、非-PLR 和 SmPC 测试数据集的测试准确率分别为 95%、88%和 88%,对于多类模型分别为 82%、73%和 68%。我们的研究表明,使用 AI 语言模型将自由文本自动分类为标准化部分可能是一种先进的监管科学方法,可通过有效处理非格式化文件来支持审查过程。