Wu Yue, Liu Zhichao, Wu Leihong, Chen Minjun, Tong Weida
Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, AR, United States.
Front Artif Intell. 2021 Dec 6;4:729834. doi: 10.3389/frai.2021.729834. eCollection 2021.
The United States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based model. FDA drug labeling documents were used as a representative regulatory data source to classify drug-induced liver injury (DILI) risk by employing the state-of-the-art language model BERT. The resulting NLP-DILI classification model was statistically validated with both internal and external validation procedures and applied to the labeling data from the European Medicines Agency (EMA) for cross-agency application. The NLP-DILI model developed using FDA labeling documents and evaluated by cross-validations in this study showed remarkable performance in DILI classification with a recall of 1 and a precision of 0.78. When cross-agency data were used to validate the model, the performance remained comparable, demonstrating that the model was portable across agencies. Results also suggested that the model was able to capture the semantic meanings of sentences in drug labeling. Deep learning-based NLP models performed well in DILI classification of drug labeling documents and learned the meanings of complex text in drug labeling. This proof-of-concept work demonstrated that using AI technologies to assist regulatory activities is a promising approach to modernize and advance regulatory science.
美国食品药品监督管理局(FDA)监管范围广泛的消费品,这些产品约占美国市场的25%。FDA的监管活动通常涉及大量文件的制作和阅读,既耗时又耗力。为了支持FDA的监管科学,我们评估了基于人工智能(AI)的自然语言处理(NLP)技术对监管文件进行文本分类,并将基于深度学习的模型与传统的基于关键词的模型进行了比较。FDA药品标签文件被用作代表性的监管数据源,通过使用最先进的语言模型BERT对药物性肝损伤(DILI)风险进行分类。所得的NLP-DILI分类模型通过内部和外部验证程序进行了统计学验证,并应用于欧洲药品管理局(EMA)的标签数据以进行跨机构应用。本研究中使用FDA标签文件开发并通过交叉验证评估的NLP-DILI模型在DILI分类中表现出色,召回率为1,精确率为0.78。当使用跨机构数据验证该模型时,性能仍然相当,表明该模型可在不同机构间移植。结果还表明,该模型能够捕捉药品标签中句子的语义。基于深度学习的NLP模型在药品标签文件的DILI分类中表现良好,并能够理解药品标签中复杂文本的含义。这项概念验证工作表明,使用人工智能技术协助监管活动是使监管科学现代化和取得进展的一种有前景的方法。