Schäfer Juliane, Strimmer Korbinian
Department of Statistics, University of Munich, Germany.
Stat Appl Genet Mol Biol. 2005;4:Article32. doi: 10.2202/1544-6115.1175. Epub 2005 Nov 14.
Inferring large-scale covariance matrices from sparse genomic data is an ubiquitous problem in bioinformatics. Clearly, the widely used standard covariance and correlation estimators are ill-suited for this purpose. As statistically efficient and computationally fast alternative we propose a novel shrinkage covariance estimator that exploits the Ledoit-Wolf (2003) lemma for analytic calculation of the optimal shrinkage intensity. Subsequently, we apply this improved covariance estimator (which has guaranteed minimum mean squared error, is well-conditioned, and is always positive definite even for small sample sizes) to the problem of inferring large-scale gene association networks. We show that it performs very favorably compared to competing approaches both in simulations as well as in application to real expression data.
从稀疏基因组数据推断大规模协方差矩阵是生物信息学中一个普遍存在的问题。显然,广泛使用的标准协方差和相关估计器并不适合此目的。作为统计效率高且计算速度快的替代方法,我们提出了一种新颖的收缩协方差估计器,该估计器利用莱多伊特 - 沃尔夫(2003)引理进行最优收缩强度的解析计算。随后,我们将这种改进的协方差估计器(其保证了最小均方误差,具有良好的条件数,并且即使对于小样本量也始终是正定的)应用于推断大规模基因关联网络的问题。我们表明,在模拟以及应用于实际表达数据时,与竞争方法相比,它的表现非常出色。