Suppr超能文献

高阶检验的准确且超高效值计算

Accurate and Ultra-Efficient -Value Calculation for Higher Criticism Tests.

作者信息

Wang Wenjia, Fang Yusi, Chang Chung, Tseng George C

机构信息

Department of Biostatistics, University of Pittsburgh.

Department of Applied Mathematics, National Sun Yat-sen University.

出版信息

J Comput Graph Stat. 2024;33(2):463-476. doi: 10.1080/10618600.2023.2270720. Epub 2023 Nov 27.

Abstract

In modern data science, higher criticism (HC) method is effective for detecting rare and weak signals. The computation, however, has long been an issue when the number of -values combined ( ) and/or the number of repeated HC tests ( ) are large. Some computing methods have been developed, but they all have significant shortcomings, especially when a stringent significance level is required. In this paper, we propose an accurate and highly efficient computing strategy for four variations of HC. Specifically, we propose an unbiased cross-entropy-based importance sampling method ( ) to benchmark all existing computing methods, and develop a modified SetTest method (MST) that resolves numerical issues of the existing SetTest approach. We further develop an ultra-fast approach (UFI) combining pre-calculated statistical tables and cubic spline interpolation. Finally, following extensive simulations, we provide a computing strategy integrating MST, UFI and other existing methods with R package "HCp" for virtually any and small -values ( ). The method is applied to a COVID-19 disease surveillance example for spatio-temporal outbreak detection from case numbers of 804 days in 3,342 counties in the United States. Results confirm viability of the computing strategy for large-scale inferences. Supplementary materials for this article are available online.

摘要

在现代数据科学中,高等批评(HC)方法对于检测罕见和微弱信号是有效的。然而,当组合的p值数量($n$)和/或重复的HC检验次数($m$)很大时,计算一直是个问题。已经开发了一些计算方法,但它们都有显著的缺点,特别是在需要严格的显著性水平时。在本文中,我们针对HC的四种变体提出了一种准确且高效的计算策略。具体而言,我们提出了一种基于无偏交叉熵的重要性抽样方法(UCE)来对所有现有的计算方法进行基准测试,并开发了一种改进的SetTest方法(MST),该方法解决了现有SetTest方法的数值问题。我们进一步开发了一种结合预先计算的统计表和三次样条插值的超快速方法(UFI)。最后,经过广泛的模拟,我们提供了一种将MST、UFI和其他现有方法与R包“HCp”集成的计算策略,适用于几乎任何$n$和较小的$m$值($m\ll n$)。该方法应用于一个COVID - 19疾病监测示例,用于从美国3342个县804天的病例数中进行时空疫情检测。结果证实了该计算策略在大规模推断中的可行性。本文的补充材料可在线获取。

相似文献

1
Accurate and Ultra-Efficient -Value Calculation for Higher Criticism Tests.
J Comput Graph Stat. 2024;33(2):463-476. doi: 10.1080/10618600.2023.2270720. Epub 2023 Nov 27.
2
Accurate and Efficient -value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.
J Am Stat Assoc. 2019;114(525):384-392. doi: 10.1080/01621459.2017.1407776. Epub 2018 Jun 28.
6
Higher criticism approach to detect rare variants using whole genome sequencing data.
BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S14. doi: 10.1186/1753-6561-8-S1-S14. eCollection 2014.
8
Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis.
Bioinformatics. 2019 Jul 15;35(14):2441-2448. doi: 10.1093/bioinformatics/bty1005.
9
Analytic P-value calculation for the higher criticism test in finite problems.
Biometrika. 2014;101(4):964-970. doi: 10.1093/biomet/asu033.
10
Computing equilibrium free energies through a nonequilibrium quench.
J Chem Phys. 2024 Jan 21;160(3). doi: 10.1063/5.0176700.

本文引用的文献

1
3
The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies.
J Am Stat Assoc. 2017;112(517):64-76. doi: 10.1080/01621459.2016.1192039. Epub 2017 May 3.
4
Analytic P-value calculation for the higher criticism test in finite problems.
Biometrika. 2014;101(4):964-970. doi: 10.1093/biomet/asu033.
5
Optimal Sparse Segment Identification with Application in Copy Number Variation Analysis.
J Am Stat Assoc. 2010 Apr 1;105(491):1156-1166. doi: 10.1198/jasa.2010.tm10083. Epub 2012 Jan 1.
6
Thresholding for biomarker selection in multivariate data using Higher Criticism.
Mol Biosyst. 2012 Sep;8(9):2339-46. doi: 10.1039/c2mb25121c. Epub 2012 Jun 29.
7
Improving the signal-to-noise ratio in genome-wide association studies.
Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S29-32. doi: 10.1002/gepi.20469.
8
Impossibility of successful classification when useful features are rare and weak.
Proc Natl Acad Sci U S A. 2009 Jun 2;106(22):8859-64. doi: 10.1073/pnas.0903931106. Epub 2009 May 15.
9
Genome-wide association analysis of metabolic traits in a birth cohort from a founder population.
Nat Genet. 2009 Jan;41(1):35-46. doi: 10.1038/ng.271. Epub 2008 Dec 7.
10
A space-time permutation scan statistic for disease outbreak detection.
PLoS Med. 2005 Mar;2(3):e59. doi: 10.1371/journal.pmed.0020059. Epub 2005 Feb 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验