Suppr超能文献

基于高维随机化的推断,利用经典设计和现代计算技术。

High-dimensional randomization-based inference capitalizing on classical design and modern computing.

作者信息

Bind Marie-Abele C, Rubin D B

机构信息

Biostatistics Center, Massachusetts General Hospital, 50, Staniford Street, Boston, MA 02114 USA.

Yau Center for Mathematical Sciences, Tsinghua University, Beijing, China.

出版信息

Behaviormetrika. 2023;50(1):9-26. doi: 10.1007/s41237-022-00183-x. Epub 2022 Sep 28.

Abstract

A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic -values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact -value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic -values or meaningless thresholds for "significance" is inapposite in general.

摘要

对高维数据进行分析时可能出现的一个常见并发症是多次使用假设检验。第二个并发症,尤其是在小样本情况下,是对渐近p值的依赖。我们提出的解决这两个并发症的方法使用了一个具有科学动机的标量汇总统计量,虽然并非全新方法,但似乎很少被使用。该方法通过一项对17名参与者的交叉研究进行说明,该研究考察了暴露于臭氧与清洁空气对DNA甲基化组的影响,其中多变量结果涉及484,531个基因组位置。我们提出的检验产生一个单一的零假设随机化分布,因此产生一个单一的费舍尔精确p值,无论数据结构如何,该p值在统计上都是有效的。然而,所得检验的相关性和功效需要事先仔细选择一个单一的检验统计量。一般来说,使用渐近p值或无意义的“显著性”阈值的常见做法是不合适的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验