• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于发现基因集功能的大语言模型评估

Evaluation of large language models for discovery of gene set function.

作者信息

Hu Mengzhou, Alkhairy Sahar, Lee Ingoo, Pillich Rudolf T, Bachelder Robin, Ideker Trey, Pratt Dexter

机构信息

Department of Medicine, University of California San Diego, La Jolla, California, USA.

Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

出版信息

Res Sq. 2023 Sep 18:rs.3.rs-3270331. doi: 10.21203/rs.3.rs-3270331/v1.

DOI:10.21203/rs.3.rs-3270331/v1
PMID:37790547
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10543283/
Abstract

Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

摘要

基因集分析是功能基因组学的支柱,但它依赖于人工策划的基因功能数据库,这些数据库并不完整,且未考虑生物学背景。在这里,我们评估了大型语言模型(LLM)OpenAI的GPT-4从其嵌入的生物医学知识中得出常见基因功能假设的能力。我们创建了一个GPT-4管道,用概括其共识功能的名称标记基因集,并通过分析文本和引用进行证实。与基因本体论中的命名基因集进行基准测试,GPT-4在50%的情况下生成了非常相似的名称,而在大多数其余情况下,它恢复了一个更通用概念的名称。在“组学”数据中发现的基因集中,GPT-4给出的名称比基因集富集分析更具信息性,其支持性陈述和引用在人工审核中大多得到了验证。快速合成常见基因功能的能力使大型语言模型成为有价值的功能基因组学助手。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/2e146ccdc803/nihpp-rs3270331v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/18022cd8ab6f/nihpp-rs3270331v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/7ba0b2088727/nihpp-rs3270331v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/556454c68a57/nihpp-rs3270331v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/1d374ae1c9ed/nihpp-rs3270331v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/294a9e366fc5/nihpp-rs3270331v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/2e146ccdc803/nihpp-rs3270331v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/18022cd8ab6f/nihpp-rs3270331v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/7ba0b2088727/nihpp-rs3270331v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/556454c68a57/nihpp-rs3270331v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/1d374ae1c9ed/nihpp-rs3270331v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/294a9e366fc5/nihpp-rs3270331v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e07a/10543283/2e146ccdc803/nihpp-rs3270331v1-f0003.jpg

相似文献

1
Evaluation of large language models for discovery of gene set function.用于发现基因集功能的大语言模型评估
Res Sq. 2023 Sep 18:rs.3.rs-3270331. doi: 10.21203/rs.3.rs-3270331/v1.
2
Evaluation of large language models for discovery of gene set function.用于发现基因集功能的大语言模型评估
ArXiv. 2024 Apr 1:arXiv:2309.04019v2.
3
Evaluation of large language models for discovery of gene set function.用于发现基因集功能的大语言模型评估
Nat Methods. 2025 Jan;22(1):82-91. doi: 10.1038/s41592-024-02525-x. Epub 2024 Nov 28.
4
Sexual Harassment and Prevention Training性骚扰与预防培训
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
7
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
8
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
9
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.大型语言模型在数值与语义医学知识方面的表现:基于循证问答的横断面基准研究
J Med Internet Res. 2025 Jul 14;27:e64452. doi: 10.2196/64452.
10
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.

本文引用的文献

1
Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.结构化提示查询和语义递归提取(SPIRES):一种使用零样本学习填充知识库的方法。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae104.
2
NDEx IQuery: a multi-method network gene set analysis leveraging the Network Data Exchange.NDEx IQuery:一种利用网络数据交换的多方法网络基因集分析。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad118.
3
The Gene Ontology knowledgebase in 2023.
2023 版基因本体论知识库。
Genetics. 2023 May 4;224(1). doi: 10.1093/genetics/iyad031.
4
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.GSEApy:一个用于在 Python 中进行基因集富集分析的综合软件包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
5
Mitochondrial RNA methyltransferase TRMT61B is a new, potential biomarker and therapeutic target for highly aneuploid cancers.线粒体 RNA 甲基转移酶 TRMT61B 是一种新的、潜在的高度非整倍体癌症的生物标志物和治疗靶点。
Cell Death Differ. 2023 Jan;30(1):37-53. doi: 10.1038/s41418-022-01044-6. Epub 2022 Jul 22.
6
Interpretation of cancer mutations using a multiscale map of protein systems.利用蛋白质系统的多尺度图谱解读癌症突变。
Science. 2021 Oct;374(6563):eabf3067. doi: 10.1126/science.abf3067. Epub 2021 Oct 1.
7
Targeting BCL-2 in Cancer: Advances, Challenges, and Perspectives.癌症中靶向BCL-2:进展、挑战与展望
Cancers (Basel). 2021 Mar 14;13(6):1292. doi: 10.3390/cancers13061292.
8
The reactome pathway knowledgebase.Reactome 通路知识库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D498-D503. doi: 10.1093/nar/gkz1031.
9
GOnet: a tool for interactive Gene Ontology analysis.GOnet:一个用于交互式基因本体论分析的工具。
BMC Bioinformatics. 2018 Dec 7;19(1):470. doi: 10.1186/s12859-018-2533-3.
10
Typing tumors using pathways selected by somatic evolution.基于体细胞进化选择的通路对肿瘤进行分型。
Nat Commun. 2018 Oct 8;9(1):4159. doi: 10.1038/s41467-018-06464-y.