• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

误入歧途:基因集内相关性的影响。

Heading down the wrong pathway: on the influence of correlation within gene sets.

机构信息

Department of Environmental Sciences & Engineering, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

出版信息

BMC Genomics. 2010 Oct 18;11:574. doi: 10.1186/1471-2164-11-574.

DOI:10.1186/1471-2164-11-574
PMID:20955544
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3091509/
Abstract

BACKGROUND

Analysis of microarray experiments often involves testing for the overrepresentation of pre-defined sets of genes among lists of genes deemed individually significant. Most popular gene set testing methods assume the independence of genes within each set, an assumption that is seriously violated, as extensive correlation between genes is a well-documented phenomenon.

RESULTS

We conducted a meta-analysis of over 200 datasets from the Gene Expression Omnibus in order to demonstrate the practical impact of strong gene correlation patterns that are highly consistent across experiments. We show that a common independence assumption-based gene set testing procedure produces very high false positive rates when applied to data sets for which treatment groups have been randomized, and that gene sets with high internal correlation are more likely to be declared significant. A reanalysis of the same datasets using an array resampling approach properly controls false positive rates, leading to more parsimonious and high-confidence gene set findings, which should facilitate pathway-based interpretation of the microarray data.

CONCLUSIONS

These findings call into question many of the gene set testing results in the literature and argue strongly for the adoption of resampling based gene set testing criteria in the peer reviewed biomedical literature.

摘要

背景

微阵列实验的分析通常涉及测试在被认为单独显著的基因列表中预定义的基因集的过表达。大多数流行的基因集测试方法假设每个集中的基因是独立的,这一假设严重违反了,因为基因之间存在广泛的相关性是一个有据可查的现象。

结果

我们对来自基因表达综合数据库的 200 多个数据集进行了荟萃分析,以证明在实验中高度一致的强基因相关模式的实际影响。我们表明,当应用于已随机分组的数据集时,基于常见独立性假设的基因集测试程序会产生非常高的假阳性率,并且具有高内部相关性的基因集更有可能被宣布为显著。使用数组重采样方法对相同数据集进行重新分析可以正确控制假阳性率,从而得出更简洁、更可信的基因集发现,这将有助于基于途径的微阵列数据分析解释。

结论

这些发现对文献中的许多基因集测试结果提出了质疑,并强烈呼吁在同行评审的生物医学文献中采用基于重采样的基因集测试标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/06bf5f219bc6/1471-2164-11-574-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/1a95f70bd188/1471-2164-11-574-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/1b59cb90c632/1471-2164-11-574-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/9a7cccf41fbe/1471-2164-11-574-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/2a754dbcf874/1471-2164-11-574-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/06bf5f219bc6/1471-2164-11-574-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/1a95f70bd188/1471-2164-11-574-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/1b59cb90c632/1471-2164-11-574-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/9a7cccf41fbe/1471-2164-11-574-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/2a754dbcf874/1471-2164-11-574-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/100b/3091509/06bf5f219bc6/1471-2164-11-574-5.jpg

相似文献

1
Heading down the wrong pathway: on the influence of correlation within gene sets.误入歧途:基因集内相关性的影响。
BMC Genomics. 2010 Oct 18;11:574. doi: 10.1186/1471-2164-11-574.
2
A resampling-based meta-analysis for detection of differential gene expression in breast cancer.一种基于重采样的荟萃分析用于检测乳腺癌中的差异基因表达。
BMC Cancer. 2008 Dec 30;8:396. doi: 10.1186/1471-2407-8-396.
3
RCMAT: a regularized covariance matrix approach to testing gene sets.RCMAT:一种基于正则化协方差矩阵的基因集检验方法。
BMC Bioinformatics. 2009 Sep 21;10:300. doi: 10.1186/1471-2105-10-300.
4
Group testing for pathway analysis improves comparability of different microarray datasets.用于通路分析的分组检验可提高不同微阵列数据集的可比性。
Bioinformatics. 2006 Oct 15;22(20):2500-6. doi: 10.1093/bioinformatics/btl424. Epub 2006 Aug 7.
5
Random forests-based differential analysis of gene sets for gene expression data.基于随机森林的基因表达数据基因集差异分析。
Gene. 2013 Apr 10;518(1):179-86. doi: 10.1016/j.gene.2012.11.034. Epub 2012 Dec 6.
6
Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays.基因集富集元学习分析:下一代测序与微阵列。
BMC Bioinformatics. 2010 Apr 8;11:176. doi: 10.1186/1471-2105-11-176.
7
Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.挖掘已发表的癌症相关微阵列实验列表:鉴定在细胞周期调控中起关键作用的基因表达特征。
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S14. doi: 10.1186/1471-2105-6-S4-S14.
8
A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide.一项用于预测未知基因功能并评估文献数据差异的微阵列表达数据的全球荟萃分析。
Bioinformatics. 2009 Jul 1;25(13):1694-701. doi: 10.1093/bioinformatics/btp290. Epub 2009 May 15.
9
Resampling reveals sample-level differential expression in clinical genome-wide studies.重采样揭示了临床全基因组研究中的样本水平差异表达。
OMICS. 2009 Oct;13(5):381-96. doi: 10.1089/omi.2009.0027.
10
Investigating the effect of paralogs on microarray gene-set analysis.研究旁系同源基因对基因芯片基因集分析的影响。
BMC Bioinformatics. 2011 Jan 24;12:29. doi: 10.1186/1471-2105-12-29.

引用本文的文献

1
A workflow for human health hazard evaluation using transcriptomic data and Key Characteristics-based gene sets.一种使用转录组数据和基于关键特征的基因集进行人类健康危害评估的工作流程。
Toxicol Sci. 2025 Jun 1;205(2):310-325. doi: 10.1093/toxsci/kfaf036.
2
Direction-aware functional class scoring enrichment analysis of infinium DNA methylation data.基于 Infinium DNA 甲基化数据的方向感知功能分类评分富集分析。
Epigenetics. 2024 Dec;19(1):2375022. doi: 10.1080/15592294.2024.2375022. Epub 2024 Jul 5.
3
Addressing erroneous scale assumptions in microbe and gene set enrichment analysis.

本文引用的文献

1
Testing the additional predictive value of high-dimensional molecular data.测试高维分子数据的额外预测价值。
BMC Bioinformatics. 2010 Feb 8;11:78. doi: 10.1186/1471-2105-11-78.
2
Gene set internal coherence in the context of functional profiling.功能谱分析背景下的基因集内部一致性。
BMC Genomics. 2009 Apr 27;10:197. doi: 10.1186/1471-2164-10-197.
3
A biological evaluation of six gene set analysis methods for identification of differentially expressed pathways in microarray data.六种基因集分析方法用于识别微阵列数据中差异表达通路的生物学评估。
解决微生物和基因集富集分析中错误的尺度假设问题。
PLoS Comput Biol. 2023 Nov 20;19(11):e1011659. doi: 10.1371/journal.pcbi.1011659. eCollection 2023 Nov.
4
FastMix: a versatile data integration pipeline for cell type-specific biomarker inference.FastMix:一种通用的数据集成管道,用于细胞类型特异性生物标志物推断。
Bioinformatics. 2022 Oct 14;38(20):4735-4744. doi: 10.1093/bioinformatics/btac585.
5
TWO-SIGMA-G: a new competitive gene set testing framework for scRNA-seq data accounting for inter-gene and cell-cell correlation.TWO-SIGMA-G:一种新的竞争基因集测试框架,用于 scRNA-seq 数据,同时考虑基因间和细胞间相关性。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac084.
6
Analysis of the immune landscape in virus-induced cancers using a novel integrative mechanism discovery approach.使用一种新型的综合机制发现方法分析病毒诱导癌症中的免疫格局。
Comput Struct Biotechnol J. 2021 Nov 18;19:6240-6254. doi: 10.1016/j.csbj.2021.11.013. eCollection 2021.
7
Test-statistic correlation and data-row correlation.检验统计量相关性和数据行相关性。
Stat Probab Lett. 2020 Dec;167. doi: 10.1016/j.spl.2020.108903. Epub 2020 Aug 14.
8
Application of Transcriptional Gene Modules to Analysis of ' Gene Expression Data.转录基因模块在基因表达数据分析中的应用
G3 (Bethesda). 2020 Oct 5;10(10):3623-3638. doi: 10.1534/g3.120.401270.
9
Gene Set Analysis: Challenges, Opportunities, and Future Research.基因集分析:挑战、机遇与未来研究
Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.
10
Network hub-node prioritization of gene regulation with intra-network association.基于网络内关联的基因调控网络枢纽-节点优先级排序。
BMC Bioinformatics. 2020 Mar 12;21(1):101. doi: 10.1186/s12859-020-3444-7.
Cancer Inform. 2008;6:357-68. doi: 10.4137/cin.s867. Epub 2008 Jun 20.
4
A general modular framework for gene set enrichment analysis.一种用于基因集富集分析的通用模块化框架。
BMC Bioinformatics. 2009 Feb 3;10:47. doi: 10.1186/1471-2105-10-47.
5
Microarray-based gene set analysis: a comparison of current methods.基于微阵列的基因集分析:当前方法的比较。
BMC Bioinformatics. 2008 Nov 27;9:502. doi: 10.1186/1471-2105-9-502.
6
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.生物信息学富集工具:通向大型基因列表全面功能分析的途径
Nucleic Acids Res. 2009 Jan;37(1):1-13. doi: 10.1093/nar/gkn923. Epub 2008 Nov 25.
7
GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus.GEOmetadb:用于基因表达综合数据库(Gene Expression Omnibus)的强大替代搜索引擎。
Bioinformatics. 2008 Dec 1;24(23):2798-800. doi: 10.1093/bioinformatics/btn520. Epub 2008 Oct 7.
8
Significance levels for studies with correlated test statistics.具有相关检验统计量的研究的显著性水平。
Biostatistics. 2008 Jul;9(3):458-66. doi: 10.1093/biostatistics/kxm047. Epub 2007 Dec 18.
9
Activation of inflammation/NF-kappaB signaling in infants born to arsenic-exposed mothers.砷暴露母亲所生婴儿体内炎症/NF-κB信号通路的激活。
PLoS Genet. 2007 Nov;3(11):e207. doi: 10.1371/journal.pgen.0030207.
10
GlobalANCOVA: exploration and assessment of gene group effects.全局协方差分析:基因组效应的探索与评估
Bioinformatics. 2008 Jan 1;24(1):78-85. doi: 10.1093/bioinformatics/btm531. Epub 2007 Nov 17.