• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一致逼近在基因集分析中更适合Wilcoxon 秩和检验。

Uniform approximation is more appropriate for Wilcoxon Rank-Sum Test in gene set analysis.

机构信息

Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana, United States of America.

出版信息

PLoS One. 2012;7(2):e31505. doi: 10.1371/journal.pone.0031505. Epub 2012 Feb 7.

DOI:10.1371/journal.pone.0031505
PMID:22347488
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3274536/
Abstract

Gene set analysis is widely used to facilitate biological interpretations in the analyses of differential expression from high throughput profiling data. Wilcoxon Rank-Sum (WRS) test is one of the commonly used methods in gene set enrichment analysis. It compares the ranks of genes in a gene set against those of genes outside the gene set. This method is easy to implement and it eliminates the dichotomization of genes into significant and non-significant in a competitive hypothesis testing. Due to the large number of genes being examined, it is impractical to calculate the exact null distribution for the WRS test. Therefore, the normal distribution is commonly used as an approximation. However, as we demonstrate in this paper, the normal approximation is problematic when a gene set with relative small number of genes is tested against the large number of genes in the complementary set. In this situation, a uniform approximation is substantially more powerful, more accurate, and less intensive in computation. We demonstrate the advantage of the uniform approximations in Gene Ontology (GO) term analysis using simulations and real data sets.

摘要

基因集分析被广泛应用于高通量分析中差异表达的生物解释。Wilcoxon 秩和(WRS)检验是基因集富集分析中常用的方法之一。它将基因集内的基因排名与基因集外的基因排名进行比较。这种方法易于实现,并且在竞争性假设检验中消除了基因的二分法,即显著和非显著。由于要检查的基因数量众多,因此计算 WRS 检验的精确零分布是不切实际的。因此,通常使用正态分布作为近似。然而,正如我们在本文中所证明的,当相对较少数量的基因集与互补集中的大量基因进行测试时,正态逼近存在问题。在这种情况下,均匀逼近在计算上更强大、更准确、更密集。我们使用模拟和真实数据集展示了在基因本体论(GO)术语分析中使用均匀逼近的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/f5a5220585e2/pone.0031505.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/5ce4a39eefa3/pone.0031505.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/1ecdbd67bf7d/pone.0031505.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/c79008f39558/pone.0031505.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/d7202ac800f1/pone.0031505.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/f5a5220585e2/pone.0031505.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/5ce4a39eefa3/pone.0031505.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/1ecdbd67bf7d/pone.0031505.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/c79008f39558/pone.0031505.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/d7202ac800f1/pone.0031505.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/123f/3274536/f5a5220585e2/pone.0031505.g005.jpg

相似文献

1
Uniform approximation is more appropriate for Wilcoxon Rank-Sum Test in gene set analysis.一致逼近在基因集分析中更适合Wilcoxon 秩和检验。
PLoS One. 2012;7(2):e31505. doi: 10.1371/journal.pone.0031505. Epub 2012 Feb 7.
2
Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers.Friedman秩和两两比较的确切p值及其在分类器比较中的应用。
BMC Bioinformatics. 2017 Jan 25;18(1):68. doi: 10.1186/s12859-017-1486-2.
3
Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.利用双向探针水平块设计通过高密度寡核苷酸阵列鉴定差异基因表达。
BMC Bioinformatics. 2004 Apr 20;5:42. doi: 10.1186/1471-2105-5-42.
4
Comparative study of gene set enrichment methods.基因集富集方法的比较研究。
BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.
5
Practical approach to determine sample size for building logistic prediction models using high-throughput data.利用高通量数据构建逻辑预测模型时确定样本量的实用方法。
J Biomed Inform. 2015 Feb;53:355-62. doi: 10.1016/j.jbi.2014.12.010. Epub 2014 Dec 30.
6
Nonparametric methods for microarray data based on exchangeability and borrowed power.基于可交换性和借势的微阵列数据非参数方法。
J Biopharm Stat. 2005;15(5):783-97. doi: 10.1081/BIP-200067778.
7
Reconstruction Set Test (RESET): A computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error.重建集检验(RESET):一种基于随机降秩重建误差的单样本基因集检验的计算高效方法。
PLoS Comput Biol. 2024 Apr 29;20(4):e1012084. doi: 10.1371/journal.pcbi.1012084. eCollection 2024 Apr.
8
The Baumgartner-Weiss-Schindler test for the detection of differentially expressed genes in replicated microarray experiments.用于在重复微阵列实验中检测差异表达基因的鲍姆加特纳-魏斯-辛德勒检验。
Bioinformatics. 2004 Dec 12;20(18):3553-64. doi: 10.1093/bioinformatics/bth442. Epub 2004 Jul 29.
9
PAGE: parametric analysis of gene set enrichment.PAGE:基因集富集的参数分析
BMC Bioinformatics. 2005 Jun 8;6:144. doi: 10.1186/1471-2105-6-144.
10
Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.使用合并重采样方法的非参数自助检验对小样本量研究进行分析。
Stat Med. 2017 Jun 30;36(14):2187-2205. doi: 10.1002/sim.7263. Epub 2017 Mar 9.

引用本文的文献

1
Identification of heterogeneity and common characteristics in colorectal carcinoma located in distinct sites.不同部位结直肠癌异质性和共同特征的鉴定。
Sci Rep. 2025 Jul 15;15(1):25496. doi: 10.1038/s41598-025-10528-7.
2
Unraveling the causal relationship and potential mechanisms between osteoarthritis and breast cancer: insights from mendelian randomization and bioinformatics analysis.揭示骨关节炎与乳腺癌之间的因果关系及潜在机制:孟德尔随机化和生物信息学分析的见解
Discov Oncol. 2024 Dec 18;15(1):769. doi: 10.1007/s12672-024-01642-5.
3
Identification and development of TP53 mutation-associated Long non-coding RNAs signature for optimized prognosis assessment and treatment selection in hepatocellular carcinoma.

本文引用的文献

1
Gene set enrichment analysis: performance evaluation and usage guidelines.基因集富集分析:性能评估和使用指南。
Brief Bioinform. 2012 May;13(3):281-91. doi: 10.1093/bib/bbr049. Epub 2011 Sep 7.
2
Design and validation issues in RNA-seq experiments.RNA-seq 实验中的设计和验证问题。
Brief Bioinform. 2011 May;12(3):280-7. doi: 10.1093/bib/bbr004. Epub 2011 Apr 15.
3
Heading down the wrong pathway: on the influence of correlation within gene sets.误入歧途:基因集内相关性的影响。
鉴定和开发与 TP53 突变相关的长非编码 RNA 标志物,用于优化肝细胞癌的预后评估和治疗选择。
Int J Immunopathol Pharmacol. 2023 Jan-Dec;37:3946320231211795. doi: 10.1177/03946320231211795.
4
Prokineticins as a Prognostic Biomarker for Low-Grade Gliomas: A Study Based on The Cancer Genome Atlas Data.胃动素原作为低级别胶质瘤的预后生物标志物:基于癌症基因组图谱数据的研究。
Biomed Res Int. 2022 Jul 7;2022:2309339. doi: 10.1155/2022/2309339. eCollection 2022.
5
Single-cell transcriptome identifies molecular subtype of autism spectrum disorder impacted by de novo loss-of-function variants regulating glial cells.单细胞转录组鉴定受调控神经胶质细胞的新生功能缺失变异影响的自闭症谱系障碍的分子亚型。
Hum Genomics. 2021 Nov 21;15(1):68. doi: 10.1186/s40246-021-00368-7.
6
Binary matrix shuffling filter for feature selection in neuronal morphology classification.用于神经元形态分类中特征选择的二元矩阵重排滤波器
Comput Math Methods Med. 2015;2015:626975. doi: 10.1155/2015/626975. Epub 2015 Mar 29.
7
Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions.基于成对基因相互作用的卡方检验进行肿瘤的信息基因选择与直接分类。
Biomed Res Int. 2014;2014:589290. doi: 10.1155/2014/589290. Epub 2014 Jul 23.
BMC Genomics. 2010 Oct 18;11:574. doi: 10.1186/1471-2164-11-574.
4
Serotonin regulates pancreatic beta cell mass during pregnancy.血清素在妊娠期间调节胰腺β细胞的数量。
Nat Med. 2010 Jul;16(7):804-8. doi: 10.1038/nm.2173. Epub 2010 Jun 27.
5
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.edgeR:一个用于数字基因表达数据差异表达分析的 Bioconductor 包。
Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11.
6
Systems biology of autosomal dominant polycystic kidney disease (ADPKD): computational identification of gene expression pathways and integrated regulatory networks.常染色体显性多囊肾病(ADPKD)的系统生物学:基因表达途径和整合调控网络的计算识别
Hum Mol Genet. 2009 Jul 1;18(13):2328-43. doi: 10.1093/hmg/ddp165. Epub 2009 Apr 3.
7
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.生物信息学富集工具:通向大型基因列表全面功能分析的途径
Nucleic Acids Res. 2009 Jan;37(1):1-13. doi: 10.1093/nar/gkn923. Epub 2008 Nov 25.
8
RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.RNA测序:技术可重复性评估及与基因表达阵列的比较
Genome Res. 2008 Sep;18(9):1509-17. doi: 10.1101/gr.079558.108. Epub 2008 Jun 11.
9
Comparative evaluation of gene-set analysis methods.基因集分析方法的比较评估
BMC Bioinformatics. 2007 Nov 7;8:431. doi: 10.1186/1471-2105-8-431.
10
Analyzing gene expression data in terms of gene sets: methodological issues.从基因集角度分析基因表达数据:方法学问题。
Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.