• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

调整还是不调整。利用基因集分析中的灵活性如何导致过度乐观。

To Tweak or Not to Tweak. How Exploiting Flexibilities in Gene Set Analysis Leads to Overoptimism.

作者信息

Wünsch Milena, Sauer Christina, Herrmann Moritz, Hinske Ludwig Christian, Boulesteix Anne-Laure

机构信息

Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, LMU Munich, Munich, Germany.

Munich Center for Machine Learning, Munich, Germany.

出版信息

Biom J. 2025 Feb;67(1):e70016. doi: 10.1002/bimj.70016.

DOI:10.1002/bimj.70016
PMID:39698741
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11656295/
Abstract

Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the "right" choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a "trial-and-error" approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of "cherry-picking" and cause an optimistic bias, rendering the results nonreplicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such overoptimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for overoptimism is particularly high for a group of methods frequently used despite being commonly criticized. We conclude by providing practical recommendations to counter overoptimism in research findings in gene set analysis and beyond.

摘要

基因集分析是一种用于分析高通量基因表达数据的常用方法,旨在识别在两种条件下显示出富集表达模式的基因集。除了有多种方法可用于此任务外,用户在创建所需输入并指定所选方法的内部参数时通常也有很多选择。这种灵活性可能会导致对“正确”选择的不确定性,而缺乏基于证据的指导则进一步加剧了这种不确定性。尤其是当他们的统计经验不足时,这种不确定性可能会诱使用户采用“试错”方法来得出更理想的结果。虽然乍一看这似乎没有问题,但这种做法可被视为一种“挑肥拣瘦”的形式,并会导致乐观偏差,使结果在独立数据上无法复制。在这个问题在经典假设检验的背景下引起了很多关注之后,我们现在旨在提高人们对基因集分析这一不同且更复杂背景下的过度乐观现象的认识。我们模拟了一位假设的研究人员,他系统地选择产生其偏好结果的分析变体,从而考虑他们可能追求的三个不同目标。使用一系列流行的基因集分析方法,我们以这种方式对两个常用的基准基因表达数据集的结果进行了调整。我们的研究表明,尽管经常受到批评,但一组常用方法的过度乐观可能性特别高。我们通过提供实用建议来结束本文,以应对基因集分析及其他领域研究结果中的过度乐观现象。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/b5d481f8f1b5/BIMJ-67-e70016-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/bc9664c229fb/BIMJ-67-e70016-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/36a43242610d/BIMJ-67-e70016-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/5a9fef537511/BIMJ-67-e70016-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/9924a3e8ede0/BIMJ-67-e70016-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/b5d481f8f1b5/BIMJ-67-e70016-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/bc9664c229fb/BIMJ-67-e70016-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/36a43242610d/BIMJ-67-e70016-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/5a9fef537511/BIMJ-67-e70016-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/9924a3e8ede0/BIMJ-67-e70016-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ced4/11656295/b5d481f8f1b5/BIMJ-67-e70016-g001.jpg

相似文献

1
To Tweak or Not to Tweak. How Exploiting Flexibilities in Gene Set Analysis Leads to Overoptimism.调整还是不调整。利用基因集分析中的灵活性如何导致过度乐观。
Biom J. 2025 Feb;67(1):e70016. doi: 10.1002/bimj.70016.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
4
The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review.综合护理路径在医疗环境中对成人和儿童的有效性:一项系统评价。
JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001.
5
Computer programs to estimate overoptimism in measures of discrimination for predicting the risk of cardiovascular diseases.用于估计预测心血管疾病风险的判别措施中过度乐观程度的计算机程序。
J Eval Clin Pract. 2013 Apr;19(2):358-62. doi: 10.1111/j.1365-2753.2012.01834.x. Epub 2012 Mar 12.
6
Good Choice, Bad Judgment: How Choice Under Uncertainty Generates Overoptimism.好的选择,糟糕的判断:不确定性下的选择如何产生过度乐观。
Psychol Sci. 2018 Feb;29(2):254-265. doi: 10.1177/0956797617731637. Epub 2017 Dec 28.
7
Overoptimism in cross-validation when using partial least squares-discriminant analysis for omics data: a systematic study.使用偏最小二乘判别分析进行组学数据分析时,交叉验证中的过度乐观:一项系统研究。
Anal Bioanal Chem. 2018 Sep;410(23):5981-5992. doi: 10.1007/s00216-018-1217-1. Epub 2018 Jun 29.
8
Erratum: High-Throughput Identification of Resistance to Pseudomonas syringae pv. Tomato in Tomato using Seedling Flood Assay.勘误:利用幼苗浸没法高通量鉴定番茄对丁香假单胞菌 pv.番茄的抗性。
J Vis Exp. 2023 Oct 18(200). doi: 10.3791/6576.
9
Dietary glycation compounds - implications for human health.饮食糖化化合物 - 对人类健康的影响。
Crit Rev Toxicol. 2024 Sep;54(8):485-617. doi: 10.1080/10408444.2024.2362985. Epub 2024 Aug 16.
10
A decision-theory approach to interpretable set analysis for high-dimensional data.一种用于高维数据可解释集分析的决策理论方法。
Biometrics. 2013 Sep;69(3):614-23. doi: 10.1111/biom.12060. Epub 2013 Aug 2.

本文引用的文献

1
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.无监督微生物组分析中的过度乐观:来自网络学习和聚类的见解。
PLoS Comput Biol. 2023 Jan 6;19(1):e1010820. doi: 10.1371/journal.pcbi.1010820. eCollection 2023 Jan.
2
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.clusterProfiler 4.0:用于解释组学数据的通用富集工具。
Innovation (Camb). 2021 Jul 1;2(3):100141. doi: 10.1016/j.xinn.2021.100141. eCollection 2021 Aug 28.
3
The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines.
分析策略的多样性危及可重复性:跨学科的经验教训。
R Soc Open Sci. 2021 Apr 21;8(4):201925. doi: 10.1098/rsos.201925.
4
Popularity and performance of bioinformatics software: the case of gene set analysis.生物信息学软件的流行度和性能:以基因集分析为例。
BMC Bioinformatics. 2021 Apr 15;22(1):191. doi: 10.1186/s12859-021-04124-5.
5
Gene Set Analysis: Challenges, Opportunities, and Future Research.基因集分析:挑战、机遇与未来研究
Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.
6
Sex Differences in Gene Expression and Regulatory Networks across 29 Human Tissues.29 个人体组织中的基因表达和调控网络的性别差异。
Cell Rep. 2020 Jun 23;31(12):107795. doi: 10.1016/j.celrep.2020.107795.
7
What is replication?复制是什么?
PLoS Biol. 2020 Mar 27;18(3):e3000691. doi: 10.1371/journal.pbio.3000691. eCollection 2020 Mar.
8
Measuring consistency among gene set analysis methods: A systematic study.评估基因集分析方法之间的一致性:一项系统研究。
J Bioinform Comput Biol. 2019 Oct;17(5):1940010. doi: 10.1142/S0219720019400109.
9
Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap.使用 g:Profiler、GSEA、Cytoscape 和 EnrichmentMap 进行组学数据的通路富集分析和可视化。
Nat Protoc. 2019 Feb;14(2):482-517. doi: 10.1038/s41596-018-0103-9.
10
Using predictive specificity to determine when gene set analysis is biologically meaningful.利用预测特异性确定基因集分析何时具有生物学意义。
Nucleic Acids Res. 2017 Feb 28;45(4):e20. doi: 10.1093/nar/gkw957.