Department of Imaging Sciences, Institute of Clinical Sciences, Hammersmith Campus, Statistics Section, Department of Mathematics, South Kensington Campus and Department of Surgery and Cancer, Ovarian Cancer Action Research Centre, Hammersmith Campus, Imperial College London, London W12 0NN, UK.
Bioinformatics. 2013 Oct 15;29(20):2555-63. doi: 10.1093/bioinformatics/btt450. Epub 2013 Aug 5.
Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at different scales or represented by different data structures.
We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through the use of two distance measures, which can be chosen to capture a particular aspect of the data. An approximate null distribution is proposed to compute P-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared with the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also demonstrate how the GRV test can be used to detect biological pathways in which genetic variability is associated to variation in gene expression levels in an ovarian cancer sample, and present results obtained from two independent cohorts.
R code to compute the GRV test is freely available from http://www2.imperial.ac.uk/∼gmontana
由于技术的快速进步,可以从给定的生物样本中获得包括单核苷酸多态性、拷贝数变异、基因表达水平、DNA 甲基化和蛋白质组学谱在内的多种不同测量结果。这些不同的测量结果都提供了一种方法来描述生物多样性的某个方面,而一个广泛关注的基本问题是发现不同数据类型之间的共享变化模式。这些数据类型在异构性方面是不同的,因为它们代表了在不同尺度上或由不同数据结构表示的测量结果。
我们提出了一种基于距离的统计检验方法,即广义 RV(GRV)检验,用于评估从同一随机样本中获得的配对生物测量值之间是否存在共同且非随机的可变性模式。通过使用两个距离度量来输入测量值,这些度量可以被选择来捕捉数据的特定方面。我们提出了一种近似的零分布,以闭式形式计算 P 值,而无需执行昂贵的蒙特卡罗置换程序。与用于距离矩阵之间关联的经典曼特尔检验相比,GRV 检验在许多模拟环境中被发现更具威力。我们还展示了如何使用 GRV 检验来检测卵巢癌样本中遗传变异与基因表达水平变化相关的生物途径,并呈现了来自两个独立队列的结果。
用于计算 GRV 检验的 R 代码可从 http://www2.imperial.ac.uk/∼gmontana 免费获得。