用于发现基因集功能的大语言模型评估

Evaluation of large language models for discovery of gene set function.

作者信息

Hu Mengzhou, Alkhairy Sahar, Lee Ingoo, Pillich Rudolf T, Fong Dylan, Smith Kevin, Bachelder Robin, Ideker Trey, Pratt Dexter

机构信息

Department of Medicine, University of California San Diego, La Jolla, California, USA.

Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

出版信息

ArXiv. 2024 Apr 1:arXiv:2309.04019v2.

PMID:37731657

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10508824/

Abstract

Gene set analysis is a mainstay of functional genomics, but it relies on curated databases of gene functions that are incomplete. Here we evaluate five Large Language Models (LLMs) for their ability to discover the common biological functions represented by a gene set, substantiated by supporting rationale, citations and a confidence assessment. Benchmarking against canonical gene sets from the Gene Ontology, GPT-4 confidently recovered the curated name or a more general concept (73% of cases), while benchmarking against random gene sets correctly yielded zero confidence. Gemini-Pro and Mixtral-Instruct showed ability in naming but were falsely confident for random sets, whereas Llama2-70b had poor performance overall. In gene sets derived from 'omics data, GPT-4 identified novel functions not reported by classical functional enrichment (32% of cases), which independent review indicated were largely verifiable and not hallucinations. The ability to rapidly synthesize common gene functions positions LLMs as valuable 'omics assistants.

摘要

基因集分析是功能基因组学的支柱，但它依赖于不完整的基因功能人工策划数据库。在这里，我们评估了五个大语言模型（LLM）发现基因集所代表的常见生物学功能的能力，并通过支持理由、引用和置信度评估来证实。与基因本体论的标准基因集进行基准测试，GPT-4有信心地恢复了策划的名称或更一般的概念（73%的情况），而与随机基因集进行基准测试时正确地给出了零置信度。Gemini-Pro和Mixtral-Instruct在命名方面表现出能力，但对随机集有错误的信心，而Llama2-70b总体表现不佳。在源自“组学”数据的基因集中，GPT-4识别出了经典功能富集未报告的新功能（32%的情况），独立审查表明这些功能在很大程度上是可验证的，并非幻觉。快速合成常见基因功能的能力使大语言模型成为有价值的“组学”助手。