Candia Julián, Fantoni Giovanna, Delgado-Peraza Francheska, Shehadeh Nader, Tanaka Toshiko, Moaddel Ruin, Walker Keenan A, Ferrucci Luigi
Intramural Research Program, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA.
bioRxiv. 2025 Aug 2:2025.07.30.667673. doi: 10.1101/2025.07.30.667673.
Motivated by the lack of adequate tools to perform pathway enrichment analysis, this work presents an approach specifically tailored to SomaScan data. Starting from annotated gene sets, we developed a greedy, top-down procedure to iteratively identify strongly intra-correlated SOMAmer modules, termed "SomaModules", based on 11K SomaScan data. We generated two repositories based on the latest MSigDB and MitoCarta releases, containing more than 40,000 SOMAmer-based gene sets combined. These repositories can be utilized by any unstructured pathway enrichment analysis tool. We validated our results with two case examples: (i) Alzheimer's Disease specific pathways in a 7K SomaScan case-control study, and (ii) mitochondrial pathways using 11K SomaScan data linked to physical performance outcomes. Using Gene Set Enrichment Analysis (GSEA), we found that, in both examples, SomaModules had significantly higher enrichment than the original gene set counterparts. These findings were robust and not significantly affected by the choice of enrichment metric or the Kolmogorov enrichment statistic used in the GSEA procedure. We provide users with access to all code, documentation and data needed to reproduce our current repositories, which also will enable them to leverage our framework to analyze SomaModules derived from other sources, including custom, user-generated gene sets.
由于缺乏进行通路富集分析的适当工具,这项工作提出了一种专门针对SomaScan数据量身定制的方法。从注释基因集开始,我们开发了一种贪婪的自上而下的程序,基于11K SomaScan数据迭代地识别高度内部相关的SOMAmer模块,称为“SomaModules”。我们基于最新的MSigDB和MitoCarta版本生成了两个存储库,总共包含超过40,000个基于SOMAmer的基因集。任何非结构化通路富集分析工具都可以使用这些存储库。我们用两个案例验证了我们的结果:(i)在一项7K SomaScan病例对照研究中的阿尔茨海默病特定通路,以及(ii)使用与身体机能结果相关的11K SomaScan数据的线粒体通路。使用基因集富集分析(GSEA),我们发现在这两个案例中,SomaModules的富集程度均显著高于原始基因集对应物。这些发现是稳健的,并且不受富集度量的选择或GSEA程序中使用的Kolmogorov富集统计量的显著影响。我们为用户提供了重现我们当前存储库所需的所有代码、文档和数据的访问权限,这也将使他们能够利用我们的框架来分析从其他来源(包括自定义的、用户生成的基因集)派生的SomaModules。
J Proteome Res. 2025-9-5
Cochrane Database Syst Rev. 2022-10-4
Cochrane Database Syst Rev. 2022-5-20
Cochrane Database Syst Rev. 2021-4-19
Cochrane Database Syst Rev. 2024-12-12
Cochrane Database Syst Rev. 2015-7-27
Cochrane Database Syst Rev. 2017-12-22
J Proteome Res. 2024-12-6
Nucleic Acids Res. 2024-1-5