Shi Yiwen, Wang Jing, Ren Ping, ValizadehAslani Taha, Zhang Yi, Hu Meng, Liang Hualou
College of Computing and Informatics, Drexel University, Philadelphia, PA, United States.
Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United States.
J Biomed Inform. 2023 Feb;138:104285. doi: 10.1016/j.jbi.2023.104285. Epub 2023 Jan 9.
Product-specific guidances (PSGs) recommended by the United States Food and Drug Administration (FDA) are instrumental to promote and guide generic drug product development. To assess a PSG, the FDA assessor needs to take extensive time and effort to manually retrieve supportive drug information of absorption, distribution, metabolism, and excretion (ADME) from the reference listed drug labeling. In this work, we leveraged the state-of-the-art pre-trained language models to automatically label the ADME paragraphs in the pharmacokinetics section from the FDA-approved drug labeling to facilitate PSG assessment. We applied a transfer learning approach by fine-tuning the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model to develop a novel application of ADME semantic labeling, which can automatically retrieve ADME paragraphs from drug labeling instead of manual work. We demonstrate that fine-tuning the pre-trained BERT model can outperform conventional machine learning techniques, achieving up to 12.5% absolute F1 improvement. To our knowledge, we were the first to successfully apply BERT to solve the ADME semantic labeling task. We further assessed the relative contribution of pre-training and fine-tuning to the overall performance of the BERT model in the ADME semantic labeling task using a series of analysis methods, such as attention similarity and layer-based ablations. Our analysis revealed that the information learned via fine-tuning is focused on task-specific knowledge in the top layers of the BERT, whereas the benefit from the pre-trained BERT model is from the bottom layers.
美国食品药品监督管理局(FDA)推荐的特定产品指南(PSG)有助于促进和指导仿制药产品的开发。为了评估一份PSG,FDA评估人员需要花费大量时间和精力从参比上市药品标签中手动检索吸收、分布、代谢和排泄(ADME)的支持性药物信息。在这项工作中,我们利用了最先进的预训练语言模型来自动标记FDA批准的药品标签中药代动力学部分的ADME段落,以促进PSG评估。我们通过微调预训练的来自变换器的双向编码器表征(BERT)模型应用迁移学习方法,开发了一种ADME语义标记的新应用,它可以从药品标签中自动检索ADME段落,而无需人工操作。我们证明,微调预训练的BERT模型优于传统的机器学习技术,可以实现高达12.5%的绝对F1改进。据我们所知,我们是第一个成功应用BERT解决ADME语义标记任务的。我们还使用了一系列分析方法,如注意力相似度和基于层的消融分析,进一步评估了预训练和微调对BERT模型在ADME语义标记任务中整体性能的相对贡献。我们的分析表明,通过微调学到的信息集中在BERT顶层的特定任务知识上,而预训练的BERT模型的好处来自底层。