Wang Wenjia, Fang Yusi, Chang Chung, Tseng George C
Department of Biostatistics, University of Pittsburgh.
Department of Applied Mathematics, National Sun Yat-sen University.
J Comput Graph Stat. 2024;33(2):463-476. doi: 10.1080/10618600.2023.2270720. Epub 2023 Nov 27.
In modern data science, higher criticism (HC) method is effective for detecting rare and weak signals. The computation, however, has long been an issue when the number of -values combined ( ) and/or the number of repeated HC tests ( ) are large. Some computing methods have been developed, but they all have significant shortcomings, especially when a stringent significance level is required. In this paper, we propose an accurate and highly efficient computing strategy for four variations of HC. Specifically, we propose an unbiased cross-entropy-based importance sampling method ( ) to benchmark all existing computing methods, and develop a modified SetTest method (MST) that resolves numerical issues of the existing SetTest approach. We further develop an ultra-fast approach (UFI) combining pre-calculated statistical tables and cubic spline interpolation. Finally, following extensive simulations, we provide a computing strategy integrating MST, UFI and other existing methods with R package "HCp" for virtually any and small -values ( ). The method is applied to a COVID-19 disease surveillance example for spatio-temporal outbreak detection from case numbers of 804 days in 3,342 counties in the United States. Results confirm viability of the computing strategy for large-scale inferences. Supplementary materials for this article are available online.
在现代数据科学中,高等批评(HC)方法对于检测罕见和微弱信号是有效的。然而,当组合的p值数量($n$)和/或重复的HC检验次数($m$)很大时,计算一直是个问题。已经开发了一些计算方法,但它们都有显著的缺点,特别是在需要严格的显著性水平时。在本文中,我们针对HC的四种变体提出了一种准确且高效的计算策略。具体而言,我们提出了一种基于无偏交叉熵的重要性抽样方法(UCE)来对所有现有的计算方法进行基准测试,并开发了一种改进的SetTest方法(MST),该方法解决了现有SetTest方法的数值问题。我们进一步开发了一种结合预先计算的统计表和三次样条插值的超快速方法(UFI)。最后,经过广泛的模拟,我们提供了一种将MST、UFI和其他现有方法与R包“HCp”集成的计算策略,适用于几乎任何$n$和较小的$m$值($m\ll n$)。该方法应用于一个COVID - 19疾病监测示例,用于从美国3342个县804天的病例数中进行时空疫情检测。结果证实了该计算策略在大规模推断中的可行性。本文的补充材料可在线获取。