Department of Statistics, University of California, Berkeley, CA 94720.
Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom.
Proc Natl Acad Sci U S A. 2020 May 5;117(18):9787-9792. doi: 10.1073/pnas.1912957117. Epub 2020 Apr 22.
Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently, this is addressed on an ad hoc basis, usually requiring the user to decide on an appropriate number of clusters to prune out of the tree to be tested against the response variable. Here, we present a statistical method with statistical guarantees that tests for association between the response variable and a fixed tree structure across all levels of the tree hierarchy with high power while accounting for the overall false positive error rate. This enhances the robustness and reproducibility of such findings.
树状结构普遍存在于基因组学和生物医学科学中,能够显示出样本之间的层次关系和潜在结构。在许多研究中,一个常见的问题是,每个样本上测量的响应变量与给定树表示的潜在分组结构之间是否存在关联。目前,这是在特定的基础上解决的,通常需要用户决定从要测试的树中修剪出适当数量的聚类,以与响应变量进行比较。在这里,我们提出了一种具有统计保证的统计方法,可以在树层次结构的所有级别上对响应变量和固定树结构之间的关联进行测试,同时保持高功效,同时考虑到总体假阳性错误率。这增强了此类发现的稳健性和可重复性。