Bharath Karthik, Kambadur Prabhanjan, Dey Dipak K, Rao Arvind, Baladandayuthapani Veerabhadran
School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD, U.K.
Bloomberg LP, New York, NY 10022, USA.
J Am Stat Assoc. 2017;112(520):1733-1743. doi: 10.1080/01621459.2016.1240081. Epub 2017 Aug 7.
We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as and random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.
我们开发了一个用于分析和推断大型树状结构数据的通用统计框架,重点是开发渐近拟合优度检验。我们首先为二叉树提出了一个一致的统计模型,并从中开发了一类不变检验。然后,利用二叉树模型,我们通过连续随机树的分布特性为一般树构建检验,连续随机树是基于条件高尔顿 - 沃森过程的一大类树状结构数据模型的不变极限。拟合优度检验的检验统计量易于计算,并且渐近分布为 和 随机变量。我们在检测脑癌肿瘤异质性的重要应用中展示了我们的方法。我们使用一种基于磁共振图像树状表示的新颖方法,并运用所开发的检验来确定两组患者之间的肿瘤异质性。