• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用参考数据集扩展基因集变异分析以稳定分数。

Extending gene set variation analysis with a reference dataset to stabilize scores.

作者信息

Towle-Miller Lorin, Jordan William, Lockhart Alexandre, Freudenburg Johannes, Virmani Aman, Bergquist Mandy, Miecznikowski Jeffrey, Powley Will

机构信息

GSK, Biostatistics, Collegeville, USA.

GSK, Computational Biology, Collegeville, USA.

出版信息

BMC Genomics. 2025 Jul 1;26(1):596. doi: 10.1186/s12864-025-11769-6.

DOI:10.1186/s12864-025-11769-6
PMID:40597576
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12211894/
Abstract

BACKGROUND

Biological pathways are sets of genes that jointly drive biological processes. Rather than analyzing genes individually, it is common practice to summarize sets of related genes using gene set variation analysis (GSVA). In short, GSVA summarizes a set of genes into a single score bounded between -1 and 1, where negative values suggest downregulation and positive values suggest upregulation. Although this interpretation is simple in theory, it depends on unbiased estimation of individual gene distributions. In the current version of GSVA, gene distributions are estimated using the input dataset (i.e., the scores are calculated based on the gene distributions from the same dataset). This becomes a major issue when study data does not adequately represent the full distribution of the population. For example, if RNA-seq data was collected on an imbalanced sample (e.g., more disease samples than healthy controls), it would be difficult to discern abnormalities in pathway activity since the gene distributions were estimated on a biased population. Therefore, we propose reference stabilizing GSVA (rsGSVA), a solution to this commonly ignored limitation by using reference datasets to estimate the gene distributions for a more stable GSVA score.

RESULTS

rsGSVA shows comparable power to classic GSVA, singscore, and ssGSEA under ideal settings while demonstrating stable scores on sample subsets. An application on irritable bowel disease highlights interpretational advantages of rsGSVA to other methods in up/down regulation of inflammation signatures.

CONCLUSIONS

The rsGSVA technique enhances the GSVA functionality by incorporating a reference dataset. This integration of a reference dataset makes the enrichment scores independent of the input distribution and ensures their stability and reproducibility, even as samples are added or removed.

摘要

背景

生物通路是共同驱动生物过程的基因集。与单独分析基因不同,使用基因集变异分析(GSVA)总结相关基因集是常见做法。简而言之,GSVA将一组基因总结为一个介于 -1 和 1 之间的单一分数,其中负值表明下调,正值表明上调。虽然这种解释在理论上很简单,但它依赖于对单个基因分布的无偏估计。在当前版本的 GSVA 中,基因分布是使用输入数据集估计的(即分数是基于同一数据集的基因分布计算的)。当研究数据不能充分代表总体的完整分布时,这就成为一个主要问题。例如,如果在不平衡样本(如疾病样本多于健康对照)上收集 RNA 测序数据,由于基因分布是在有偏差的总体上估计的,就很难辨别通路活性的异常。因此,我们提出参考稳定化 GSVA(rsGSVA),通过使用参考数据集估计基因分布以获得更稳定的 GSVA 分数,来解决这个普遍被忽视的限制。

结果

在理想设置下,rsGSVA 与经典 GSVA、singscore 和 ssGSEA 具有相当的效能,同时在样本子集上表现出稳定的分数。在肠易激综合征上的应用突出了 rsGSVA 在炎症特征上调/下调方面相对于其他方法的解释优势。

结论

rsGSVA 技术通过纳入参考数据集增强了 GSVA 的功能。参考数据集的这种整合使富集分数独立于输入分布,并确保其稳定性和可重复性,即使添加或移除样本也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/cbeb673f025e/12864_2025_11769_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/8731a6aacbd0/12864_2025_11769_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/78bfc7c5a7d4/12864_2025_11769_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/5b1384374238/12864_2025_11769_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/88c02caca6d6/12864_2025_11769_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/e08419e845ab/12864_2025_11769_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/88ebea97b383/12864_2025_11769_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/87e76b12d3ef/12864_2025_11769_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/cbeb673f025e/12864_2025_11769_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/8731a6aacbd0/12864_2025_11769_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/78bfc7c5a7d4/12864_2025_11769_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/5b1384374238/12864_2025_11769_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/88c02caca6d6/12864_2025_11769_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/e08419e845ab/12864_2025_11769_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/88ebea97b383/12864_2025_11769_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/87e76b12d3ef/12864_2025_11769_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/12211894/cbeb673f025e/12864_2025_11769_Fig8_HTML.jpg

相似文献

1
Extending gene set variation analysis with a reference dataset to stabilize scores.使用参考数据集扩展基因集变异分析以稳定分数。
BMC Genomics. 2025 Jul 1;26(1):596. doi: 10.1186/s12864-025-11769-6.
2
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤:系统评价与经济学评估
Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.
3
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
4
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
5
What is the value of routinely testing full blood count, electrolytes and urea, and pulmonary function tests before elective surgery in patients with no apparent clinical indication and in subgroups of patients with common comorbidities: a systematic review of the clinical and cost-effective literature.在没有明显临床指征的患者和常见合并症患者亚组中,在择期手术前常规检测全血细胞计数、电解质和尿素以及肺功能测试的价值:对临床和成本效益文献的系统评价。
Health Technol Assess. 2012 Dec;16(50):i-xvi, 1-159. doi: 10.3310/hta16500.
6
The clinical effectiveness and cost-effectiveness of enzyme replacement therapy for Gaucher's disease: a systematic review.戈谢病酶替代疗法的临床疗效和成本效益:一项系统评价。
Health Technol Assess. 2006 Jul;10(24):iii-iv, ix-136. doi: 10.3310/hta10240.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
9
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
10
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

本文引用的文献

1
Predictive biomarkers for anti-TNF alpha therapy in IBD patients.炎症性肠病患者抗 TNF-α 治疗的预测生物标志物。
J Transl Med. 2024 Mar 16;22(1):284. doi: 10.1186/s12967-024-05058-1.
2
Balanced Functional Module Detection in genomic data.基因组数据中的平衡功能模块检测
Bioinform Adv. 2021 Sep 16;1(1):vbab018. doi: 10.1093/bioadv/vbab018. eCollection 2021.
3
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges.高通量基因组数据的基因集分析十五年:统计方法综述与未来挑战
Entropy (Basel). 2020 Apr 10;22(4):427. doi: 10.3390/e22040427.
4
African Ancestry Proportion Influences Ileal Gene Expression in Inflammatory Bowel Disease.非洲血统比例影响炎症性肠病患者回肠基因表达。
Cell Mol Gastroenterol Hepatol. 2020;10(1):203-205. doi: 10.1016/j.jcmgh.2020.02.001. Epub 2020 Feb 10.
5
Defining the Celiac Disease Transcriptome using Clinical Pathology Specimens Reveals Biologic Pathways and Supports Diagnosis.使用临床病理标本定义乳糜泻转录组可揭示生物学途径并支持诊断。
Sci Rep. 2019 Nov 7;9(1):16163. doi: 10.1038/s41598-019-52733-1.
6
Age-of-diagnosis dependent ileal immune intensification and reduced alpha-defensin in older versus younger pediatric Crohn Disease patients despite already established dysbiosis.尽管已经存在菌群失调,但与年轻的儿科克罗恩病患者相比,老年患者的发病年龄依赖性回肠免疫强化和α-防御素减少。
Mucosal Immunol. 2019 Mar;12(2):491-502. doi: 10.1038/s41385-018-0114-4. Epub 2018 Dec 12.
7
Single sample scoring of molecular phenotypes.单样本分子表型评分。
BMC Bioinformatics. 2018 Nov 6;19(1):404. doi: 10.1186/s12859-018-2435-4.
8
C-C Motif Ligand 20 (CCL20) and C-C Motif Chemokine Receptor 6 (CCR6) in Human Peripheral Blood Mononuclear Cells: Dysregulated in Ulcerative Colitis and a Potential Role for CCL20 in IL-1β Release.人外周血单个核细胞中的 C-C 基序趋化因子配体 20(CCL20)和 C-C 基序趋化因子受体 6(CCR6):溃疡性结肠炎中的失调及其在 IL-1β 释放中的潜在作用。
Int J Mol Sci. 2018 Oct 20;19(10):3257. doi: 10.3390/ijms19103257.
9
Alternative empirical Bayes models for adjusting for batch effects in genomic studies.用于调整基因组研究中批次效应的替代经验贝叶斯模型。
BMC Bioinformatics. 2018 Jul 13;19(1):262. doi: 10.1186/s12859-018-2263-6.
10
A functional genomics predictive network model identifies regulators of inflammatory bowel disease.一种功能基因组学预测网络模型可识别炎症性肠病的调控因子。
Nat Genet. 2017 Oct;49(10):1437-1449. doi: 10.1038/ng.3947. Epub 2017 Sep 11.