Boonstra Philip S, Taylor Jeremy M G, Mukherjee Bhramar
Department of Biostatistics, University of Michigan, 1415 Washington Hts., Ann Arbor, MI, USA. Tel. +1 (734) 615-1580.
Stat Biosci. 2015 Oct 1;7(2):417-431. doi: 10.1007/s12561-015-9132-x. Epub 2015 Jun 3.
We propose an extension of the expectation-maximization (EM) algorithm, called the hyperpenalized EM (HEM) algorithm, that maximizes a penalized log-likelihood, for which some data are missing or unavailable, using a data-adaptive estimate of the penalty parameter. This is potentially useful in applications for which the analyst is unable or unwilling to choose a single value of a penalty parameter but instead can posit a plausible range of values. The HEM algorithm is conceptually straightforward and also very effective, and we demonstrate its utility in the analysis of a genomic data set. Gene expression measurements and clinical covariates were used to predict survival time. However, many survival times are censored, and some observations only contain expression measurements derived from a different assay, which together constitute a difficult missing data problem. It is desired to shrink the genomic contribution in a data-adaptive way. The HEM algorithm successfully handles both the missing data and shrinkage aspects of the problem.
我们提出了期望最大化(EM)算法的一种扩展,称为超惩罚EM(HEM)算法,它使用惩罚参数的数据自适应估计来最大化惩罚对数似然,其中一些数据缺失或不可用。这在分析师无法或不愿意选择惩罚参数的单个值,而是可以设定一个合理的值范围的应用中可能很有用。HEM算法在概念上很简单,而且非常有效,我们在一个基因组数据集的分析中证明了它的实用性。基因表达测量值和临床协变量被用于预测生存时间。然而,许多生存时间是被截尾的,并且一些观测值仅包含来自不同检测方法的表达测量值,这共同构成了一个棘手的缺失数据问题。期望以数据自适应的方式缩小基因组的贡献。HEM算法成功地处理了该问题中缺失数据和收缩这两个方面。