• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于基于重采样的显著性检验的有效差异分配的贝叶斯方法。

A Bayesian approach to efficient differential allocation for resampling-based significance testing.

作者信息

Jensen Shane T, Soi Sameer, Wang Li-San

机构信息

Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA.

出版信息

BMC Bioinformatics. 2009 Jun 28;10:198. doi: 10.1186/1471-2105-10-198.

DOI:10.1186/1471-2105-10-198
PMID:19558706
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2718927/
Abstract

BACKGROUND

Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.

RESULTS

We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.

CONCLUSION

Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/.

摘要

背景

由于高通量检测技术的进步以及大型生物数据库的整合,大规模统计分析已成为后基因组时代生物学研究的标志。随之而来的一个问题是对大量假设检验的p值进行同时估计。在许多应用中,原假设分布中的参数假设(如正态性)可能不合理,基于重抽样的p值是确定统计显著性的首选方法。在多重检验中使用基于重抽样的方法计算量很大,通常需要大量的重抽样。

结果

我们提出了一种新方法,可在非参数多重检验框架内更有效地分配重抽样(如自助抽样或置换)。我们针对此问题制定了一种受贝叶斯启发的方法,并设计了一种算法,该算法以可忽略不计的空间和运行时间开销迭代地调整重抽样的分配。在两项实验研究中,一个乳腺癌微阵列数据集和一个帕金森病全基因组关联研究数据集,我们证明了与传统的均匀重抽样分配相比,我们的差异分配程序要准确得多。

结论

我们的实验表明,使用更复杂的分配策略可以在不显著增加随机数据计算量的情况下改善我们对假设检验的推断。此外,当检验数量很大时,我们在效率上获得了更多提升。我们算法和快捷方法的R代码可在http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/0092b5722565/1471-2105-10-198-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/91e1f6357975/1471-2105-10-198-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/40b22a17f3c4/1471-2105-10-198-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/e34abfe7e2cf/1471-2105-10-198-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/0092b5722565/1471-2105-10-198-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/91e1f6357975/1471-2105-10-198-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/40b22a17f3c4/1471-2105-10-198-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/e34abfe7e2cf/1471-2105-10-198-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cd6/2718927/0092b5722565/1471-2105-10-198-4.jpg

相似文献

1
A Bayesian approach to efficient differential allocation for resampling-based significance testing.一种用于基于重采样的显著性检验的有效差异分配的贝叶斯方法。
BMC Bioinformatics. 2009 Jun 28;10:198. doi: 10.1186/1471-2105-10-198.
2
Rapid and robust resampling-based multiple-testing correction with application in a genome-wide expression quantitative trait loci study.基于快速稳健重抽样的多重检验校正及其在全基因组表达数量性状基因座研究中的应用。
Genetics. 2012 Apr;190(4):1511-20. doi: 10.1534/genetics.111.137737. Epub 2012 Jan 31.
3
NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data.NPEBseq:一种基于非参数经验贝叶斯的 RNA-seq 数据差异表达分析方法。
BMC Bioinformatics. 2013 Aug 27;14:262. doi: 10.1186/1471-2105-14-262.
4
Efficient p-value evaluation for resampling-based tests.基于重采样检验的高效 p 值评估。
Biostatistics. 2011 Jul;12(3):582-93. doi: 10.1093/biostatistics/kxq078. Epub 2011 Jan 5.
5
Bayesian optimal discovery procedure for simultaneous significance testing.用于同时进行显著性检验的贝叶斯最优发现程序。
BMC Bioinformatics. 2009 Jan 6;10:5. doi: 10.1186/1471-2105-10-5.
6
Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.评估双色微阵列中的差异表达:基于重采样的经验贝叶斯方法。
PLoS One. 2013 Nov 27;8(11):e80099. doi: 10.1371/journal.pone.0080099. eCollection 2013.
7
Empirical Bayes screening of many p-values with applications to microarray studies.用于微阵列研究的多p值经验贝叶斯筛选。
Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.
8
Efficient p-value estimation in massively parallel testing problems.
Biostatistics. 2008 Oct;9(4):601-12. doi: 10.1093/biostatistics/kxm053. Epub 2008 Feb 27.
9
BootstRatio: A web-based statistical analysis of fold-change in qPCR and RT-qPCR data using resampling methods.BootstRatio:一种基于网络的 qPCR 和 RT-qPCR 数据中 fold-change 统计分析的重抽样方法。
Comput Biol Med. 2012 Apr;42(4):438-45. doi: 10.1016/j.compbiomed.2011.12.012. Epub 2012 Jan 24.
10
Prioritizing tests of epistasis through hierarchical representation of genomic redundancies.通过基因组冗余的分层表示对上位性测试进行优先级排序。
Nucleic Acids Res. 2017 Aug 21;45(14):e131. doi: 10.1093/nar/gkx505.

引用本文的文献

1
Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.评估双色微阵列中的差异表达:基于重采样的经验贝叶斯方法。
PLoS One. 2013 Nov 27;8(11):e80099. doi: 10.1371/journal.pone.0080099. eCollection 2013.
2
Analysis of Correlated Gene Expression Data on Ordered Categories.有序类别相关基因表达数据的分析
J Indian Soc Agric Stat. 2010;64(1):45-60.
3
FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution.FastPval:一个快速且内存高效的程序,用于从经验分布中计算极低的 P 值。

本文引用的文献

1
PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK:一个用于全基因组关联分析和基于群体的连锁分析的工具集。
Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.
2
Estimating p-values in small microarray experiments.在小型微阵列实验中估计p值。
Bioinformatics. 2007 Jan 1;23(1):38-43. doi: 10.1093/bioinformatics/btl548. Epub 2006 Oct 30.
3
Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data.
Bioinformatics. 2010 Nov 15;26(22):2897-9. doi: 10.1093/bioinformatics/btq540. Epub 2010 Sep 21.
帕金森病与神经功能正常对照的全基因组基因分型:第一阶段分析及数据公开发布
Lancet Neurol. 2006 Nov;5(11):911-6. doi: 10.1016/S1474-4422(06)70578-6.
4
A stochastic downhill search algorithm for estimating the local false discovery rate.一种用于估计局部错误发现率的随机下山搜索算法。
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jul-Sep;1(3):98-108. doi: 10.1109/TCBB.2004.24.
5
STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments.STAC:一种用于在多个阵列比较基因组杂交实验中检测DNA拷贝数变异显著性的方法。
Genome Res. 2006 Sep;16(9):1149-58. doi: 10.1101/gr.5076506. Epub 2006 Aug 9.
6
Family-based designs in the age of large-scale gene-association studies.大规模基因关联研究时代的基于家系的设计。
Nat Rev Genet. 2006 May;7(5):385-94. doi: 10.1038/nrg1839.
7
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.基因集富集分析:一种基于知识的方法用于解读全基因组表达谱。
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.
8
A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.关于使用基于排列的错误发现率估计来比较微阵列数据不同分析方法的说明。
Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.
9
Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data.基于秩不变重采样的小样本微阵列数据分析中错误发现率估计
BMC Bioinformatics. 2005 Jul 22;6:187. doi: 10.1186/1471-2105-6-187.
10
Statistical significance for genomewide studies.全基因组研究的统计学显著性
Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.