Hu Mengzhou, Alkhairy Sahar, Lee Ingoo, Pillich Rudolf T, Fong Dylan, Smith Kevin, Bachelder Robin, Ideker Trey, Pratt Dexter
Department of Medicine, University of California San Diego, La Jolla, CA, USA.
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
Nat Methods. 2025 Jan;22(1):82-91. doi: 10.1038/s41592-024-02525-x. Epub 2024 Nov 28.
Gene set enrichment is a mainstay of functional genomics, but it relies on gene function databases that are incomplete. Here we evaluate five large language models (LLMs) for their ability to discover the common functions represented by a gene set, supported by molecular rationale and a self-confidence assessment. For curated gene sets from Gene Ontology, GPT-4 suggests functions similar to the curated name in 73% of cases, with higher self-confidence predicting higher similarity. Conversely, random gene sets correctly yield zero confidence in 87% of cases. Other LLMs (GPT-3.5, Gemini Pro, Mixtral Instruct and Llama2 70b) vary in function recovery but are falsely confident for random sets. In gene clusters from omics data, GPT-4 identifies common functions for 45% of cases, fewer than functional enrichment but with higher specificity and gene coverage. Manual review of supporting rationale and citations finds these functions are largely verifiable. These results position LLMs as valuable omics assistants.
基因集富集是功能基因组学的支柱,但它依赖于不完整的基因功能数据库。在这里,我们评估了五个大语言模型(LLMs)发现基因集所代表的共同功能的能力,并辅以分子原理和自我信心评估。对于来自基因本体论的经过策划的基因集,GPT-4在73%的情况下提出了与策划名称相似的功能,自我信心越高,预测的相似性越高。相反,随机基因集在87%的情况下正确地产生零信心。其他大语言模型(GPT-3.5、Gemini Pro、Mixtral Instruct和Llama2 70b)在功能恢复方面有所不同,但对随机集存在错误的信心。在组学数据的基因簇中,GPT-4在45%的情况下识别出共同功能,比功能富集少,但具有更高的特异性和基因覆盖率。对支持原理和引用的人工审查发现这些功能在很大程度上是可验证的。这些结果表明大语言模型是有价值的组学助手。