Suppr超能文献

利用大语言模型提高药物基因组学数据的可及性和药物安全性:以Llama3.1为例的研究

Enhancing pharmacogenomic data accessibility and drug safety with large language models: a case study with Llama3.1.

作者信息

Li Dan, Wu Leihong, Lin Ying-Chi, Huang Ho-Yin, Cotton Ebony, Liu Qi, Chen Ru, Huang Ruihao, Zhang Yifan, Xu Joshua

机构信息

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States.

School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan.

出版信息

Exp Biol Med (Maywood). 2024 Dec 3;249:10393. doi: 10.3389/ebm.2024.10393. eCollection 2024.

Abstract

Pharmacogenomics (PGx) holds the promise of personalizing medical treatments based on individual genetic profiles, thereby enhancing drug efficacy and safety. However, the current landscape of PGx research is hindered by fragmented data sources, time-consuming manual data extraction processes, and the need for comprehensive and up-to-date information. This study aims to address these challenges by evaluating the ability of Large Language Models (LLMs), specifically Llama3.1-70B, to automate and improve the accuracy of PGx information extraction from the FDA Table of Pharmacogenomic Biomarkers in Drug Labeling (FDA PGx Biomarker table), which is well-structured with drug names, biomarkers, therapeutic area, and related labeling texts. Our primary goal was to test the feasibility of LLMs in streamlining PGx data extraction, as an alternative to traditional, labor-intensive approaches. Llama3.1-70B achieved 91.4% accuracy in identifying drug-biomarker pairs from single labeling texts and 82% from mixed texts, with over 85% consistency in aligning extracted PGx categories from FDA PGx Biomarker table and relevant scientific abstracts, demonstrating its effectiveness for PGx data extraction. By integrating data from diverse sources, including scientific abstracts, this approach can support pharmacologists, regulatory bodies, and healthcare researchers in updating PGx resources more efficiently, making critical information more accessible for applications in personalized medicine. In addition, this approach shows potential of discovering novel PGx information, particularly of underrepresented minority ethnic groups. This study highlights the ability of LLMs to enhance the efficiency and completeness of PGx research, thus laying a foundation for advancements in personalized medicine by ensuring that drug therapies are tailored to the genetic profiles of diverse populations.

摘要

药物基因组学(PGx)有望根据个体基因图谱实现医疗治疗的个性化,从而提高药物疗效和安全性。然而,当前PGx研究面临着数据来源分散、手动数据提取过程耗时以及需要全面和最新信息等问题。本研究旨在通过评估大语言模型(LLMs),特别是Llama3.1 - 70B,从美国食品药品监督管理局(FDA)药物标签中的药物基因组生物标志物表格(FDA PGx生物标志物表格)自动提取PGx信息并提高其准确性的能力,来应对这些挑战。该表格结构良好,包含药物名称、生物标志物、治疗领域及相关标签文本。我们的主要目标是测试大语言模型在简化PGx数据提取方面的可行性,作为传统劳动密集型方法的替代方案。Llama3.1 - 70B从单一标签文本中识别药物 - 生物标志物对的准确率达到91.4%,从混合文本中识别的准确率为82%,在将从FDA PGx生物标志物表格中提取的PGx类别与相关科学摘要进行比对时,一致性超过85%,证明了其在PGx数据提取方面的有效性。通过整合包括科学摘要在内的各种来源的数据,这种方法可以支持药理学家、监管机构和医疗保健研究人员更高效地更新PGx资源,使关键信息在个性化医疗应用中更易获取。此外,这种方法显示出发现新的PGx信息的潜力,特别是针对代表性不足的少数族裔群体。本研究突出了大语言模型提高PGx研究效率和完整性的能力,从而通过确保药物治疗根据不同人群的基因图谱进行定制,为个性化医疗的进步奠定基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/510a/11650518/dd6e99480786/ebm-249-10393-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验