Suppr超能文献

对基于指南的药物基因组学(PGx)建议进行复制的大型语言模型基准测试。

Benchmarking large language models for replication of guideline-based PGx recommendations.

作者信息

Zack Mike, Slobodchikov Ioan, Stupichev Danil, Moore Alex, Sokolov David, Trifonov Igor, Gobbs Allan

机构信息

PGxAI Inc., 330 E Charleston Rd, Palo Alto, CA, 94306, USA.

出版信息

Pharmacogenomics J. 2025 Jul 26;25(4):23. doi: 10.1038/s41397-025-00383-0.

Abstract

We evaluated the ability of large language models (LLMs) to generate clinically accurate pharmacogenomic (PGx) recommendations aligned with CPIC guidelines. Using a benchmark of 599 curated gene-drug-phenotype scenarios, we compared five leading models, including GPT-4o and fine-tuned LLaMA variants, through both standard lexical metrics and a novel semantic evaluation framework (LLM Score) validated by expert review. General-purpose models frequently produced incomplete or unsafe outputs, while our domain-adapted model achieved superior performance, with an LLM Score of 0.92 and significantly faster inference speed. Results highlight the importance of fine-tuning and structured prompting over model scale alone. This work establishes a robust framework for evaluating PGx-specific LLMs and demonstrates the feasibility of safer, AI-driven personalized medicine.

摘要

我们评估了大语言模型(LLMs)生成符合CPIC指南的临床准确药物基因组学(PGx)建议的能力。使用599个经过整理的基因-药物-表型情景的基准,我们通过标准词汇指标和经专家评审验证的新型语义评估框架(LLM评分),比较了包括GPT-4o和微调后的LLaMA变体在内的五个领先模型。通用模型经常产生不完整或不安全的输出,而我们的领域适应模型表现出色,LLM评分为0.92,推理速度明显更快。结果突出了仅靠模型规模进行微调与结构化提示的重要性。这项工作建立了一个强大的框架来评估特定于PGx的LLMs,并证明了更安全的、人工智能驱动的个性化医疗的可行性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验