Departamento de Economía, Métodos Cuantitativos e Historia Económica, Universidad Pablo de Olavide, Seville, Spain.
Biostatistics. 2010 Apr;11(2):254-64. doi: 10.1093/biostatistics/kxp056. Epub 2010 Jan 11.
Microarray experiments provide data on the expression levels of thousands of genes and, therefore, statistical methods applicable to the analysis of such high-dimensional data are needed. In this paper, we propose robust nonparametric tools for the description and analysis of microarray data based on the concept of functional depth, which measures the centrality of an observation within a sample. We show that this concept can be easily adapted to high-dimensional observations and, in particular, to gene expression data. This allows the development of the following depth-based inference tools: (1) a scale curve for measuring and visualizing the dispersion of a set of points, (2) a rank test for deciding if 2 groups of multidimensional observations come from the same population, and (3) supervised classification techniques for assigning a new sample to one of G given groups. We apply these methods to microarray data, and to simulated data including contaminated models, and show that they are robust, efficient, and competitive with other procedures proposed in the literature, outperforming them in some situations.
微阵列实验提供了数千个基因表达水平的数据,因此需要应用于分析此类高维数据的统计方法。在本文中,我们基于功能深度的概念提出了稳健的非参数工具,用于描述和分析微阵列数据,该概念衡量了观测值在样本中的中心性。我们表明,这个概念可以很容易地适用于高维观测值,特别是基因表达数据。这允许开发以下基于深度的推断工具:(1)一种尺度曲线,用于测量和可视化一组点的分散程度,(2)一种秩检验,用于判断 2 组多维观测值是否来自同一总体,以及(3)监督分类技术,用于将新样本分配到 G 个给定组中的一个。我们将这些方法应用于微阵列数据以及包括污染模型的模拟数据,并表明它们是稳健、高效的,并且与文献中提出的其他方法竞争,在某些情况下表现优于其他方法。