Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University, 1081 HV, Amsterdam, The Netherlands.
Commun Biol. 2024 Jun 19;7(1):744. doi: 10.1038/s42003-024-06454-5.
Gene set enrichment analysis is foundational to the interpretation of high throughput biology. Identifying enriched Gene Ontology (GO) terms or disease-associated gene sets within a list of gene effect sizes that represent experimental outcomes is an everyday task in life science that crucially depends on robust and sensitive statistical tools. We here present GOAT, a parameter-free algorithm for gene set enrichment analysis of preranked gene lists. The algorithm can precompute null distributions from standardized gene scores, enabling enrichment testing of the GO database in one second. Validations using synthetic data show that estimated gene set p-values are well calibrated under the null hypothesis and invariant to gene list length and gene set size. Application to various real-world proteomics and gene expression studies demonstrates that GOAT identifies more significant GO terms as compared to current methods. GOAT is freely available as an R package and user-friendly online tool for gene set enrichment analyses that includes interactive data visualizations: https://ftwkoopmans.github.io/goat .
基因集富集分析是解释高通量生物学的基础。在代表实验结果的基因效应大小列表中,识别富集的基因本体论(GO)术语或与疾病相关的基因集是生命科学中一项日常任务,这需要稳健和敏感的统计工具。我们在此提出 GOAT,这是一种用于预排序基因列表的基因集富集分析的无参数算法。该算法可以从标准化基因得分中预先计算零分布,从而在一秒钟内对 GO 数据库进行富集测试。使用合成数据进行验证表明,在零假设下,估计的基因集 p 值具有良好的校准性,并且不受基因列表长度和基因集大小的影响。将其应用于各种真实的蛋白质组学和基因表达研究表明,与当前方法相比,GOAT 可以识别更显著的 GO 术语。GOAT 作为一个 R 包和用户友好的在线工具免费提供,用于基因集富集分析,包括交互式数据可视化:https://ftwkoopmans.github.io/goat。