Suppr超能文献

迈向可解释的大型语言模型,用于自动识别药物性肝损伤文献。

Toward an Explainable Large Language Model for the Automatic Identification of the Drug-Induced Liver Injury Literature.

机构信息

JMP Statistical Discovery, LLC, Cary, North Carolina 27513, United States.

出版信息

Chem Res Toxicol. 2024 Sep 16;37(9):1524-1534. doi: 10.1021/acs.chemrestox.4c00134. Epub 2024 Aug 27.

Abstract

Drug-induced liver injury (DILI) stands as a significant concern in drug safety, representing the primary cause of acute liver failure. Identifying the scientific literature related to DILI is crucial for monitoring, investigating, and conducting meta-analyses of drug safety issues. Given the intricate and often obscure nature of drug interactions, simple keyword searching can be insufficient for the exhaustive retrieval of the DILI-relevant literature. Manual curation of DILI-related publications demands pharmaceutical expertise and is susceptible to errors, severely limiting throughput. Despite numerous efforts utilizing cutting-edge natural language processing and deep learning techniques to automatically identify the DILI-related literature, their performance remains suboptimal for real-world applications in clinical research and regulatory contexts. In the past year, large language models (LLMs) such as ChatGPT and its open-source counterpart LLaMA have achieved groundbreaking progress in natural language understanding and question answering, paving the way for the automated, high-throughput identification of the DILI-related literature and subsequent analysis. Leveraging a large-scale public dataset comprising 14 203 training publications from the CAMDA 2022 literature AI challenge, we have developed what we believe to be the first LLM specialized in DILI analysis based on LLaMA-2. In comparison with other smaller language models such as BERT, GPT, and their variants, LLaMA-2 exhibits an enhanced out-of-fold accuracy of 97.19% and area under the ROC curve of 0.9947 using 3-fold cross-validation on the training set. Despite LLMs' initial design for dialogue systems, our study illustrates their successful adaptation into accurate classifiers for automated identification of the DILI-related literature from vast collections of documents. This work is a step toward unleashing the potential of LLMs in the context of regulatory science and facilitating the regulatory review process.

摘要

药物性肝损伤 (DILI) 是药物安全性的一个重要关注点,是急性肝衰竭的主要原因。识别与 DILI 相关的科学文献对于监测、调查和进行药物安全性问题的荟萃分析至关重要。鉴于药物相互作用的复杂性和模糊性,简单的关键词搜索可能不足以穷尽检索与 DILI 相关的文献。对与 DILI 相关的出版物进行手动分类需要药物学专业知识,并且容易出错,严重限制了处理速度。尽管利用最先进的自然语言处理和深度学习技术来自动识别与 DILI 相关的文献已经做了很多努力,但它们的性能在临床研究和监管环境中的实际应用中仍然不够理想。在过去的一年中,大型语言模型 (LLM) 如 ChatGPT 及其开源版本 LLaMA 在自然语言理解和问答方面取得了突破性进展,为自动、高通量地识别与 DILI 相关的文献以及随后的分析铺平了道路。我们利用一个包含 14203 个来自 2022 年 CAMDA 文献人工智能挑战赛训练出版物的大型公共数据集,开发了我们认为是第一个基于 LLaMA-2 的专门用于 DILI 分析的 LLM。与其他较小的语言模型(如 BERT、GPT 及其变体)相比,LLaMA-2 在训练集上使用 3 倍交叉验证时的 OOF 准确率为 97.19%,ROC 曲线下面积为 0.9947。尽管 LLM 最初是为对话系统设计的,但我们的研究表明,它们可以成功地适应为从大量文档中自动识别与 DILI 相关的文献的准确分类器。这项工作是在监管科学背景下释放 LLM 潜力并促进监管审查过程的一步。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验