Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 485 Lexington Ave, 2nd floor, New York, NY, 10017, USA.
BMC Bioinformatics. 2019 Nov 8;20(1):555. doi: 10.1186/s12859-019-3148-z.
We previously introduced a random-effects model to analyze a set of patients, each of which has two distinct tumors. The goal is to estimate the proportion of patients for which one of the tumors is a metastasis of the other, i.e. where the tumors are clonally related. Matches of mutations within a tumor pair provide the evidence for clonal relatedness. In this article, using simulations, we compare two estimation approaches that we considered for our model: use of a constrained quasi-Newton algorithm to maximize the likelihood conditional on the random effect, and an Expectation-Maximization algorithm where we further condition the random-effect distribution on the data.
In some specific settings, especially with sparse information, the estimation of the parameter of interest is at the boundary a non-negligible number of times using the first approach, while the EM algorithm gives more satisfactory estimates. This is of considerable importance for our application, since an estimate of either 0 or 1 for the proportion of cases that are clonal leads to individual probabilities being 0 or 1 in settings where the evidence is clearly not sufficient for such definitive probability estimates.
The EM algorithm is a preferable approach for our clonality random-effect model. It is now the method implemented in our R package Clonality, making available an easy and fast way to estimate this model on a range of applications.
我们之前介绍了一种随机效应模型,用于分析一组具有两个不同肿瘤的患者。目标是估计其中一个肿瘤是另一个肿瘤转移的患者比例,即肿瘤具有克隆相关性。肿瘤对中突变的匹配为克隆相关性提供了证据。在本文中,我们使用模拟比较了我们考虑用于模型的两种估计方法:使用约束拟牛顿算法在随机效应条件下最大化似然,以及期望最大化算法,其中我们进一步将随机效应分布条件化到数据上。
在某些特定情况下,特别是在信息稀疏的情况下,使用第一种方法,感兴趣参数的估计在边界处是非零次数,而 EM 算法给出了更令人满意的估计。这对于我们的应用非常重要,因为在证据显然不足以进行此类确定性概率估计的情况下,比例的估计为 0 或 1 会导致个体概率为 0 或 1。
EM 算法是我们克隆随机效应模型的首选方法。它现在是我们 R 包 Clonality 中实现的方法,为在一系列应用中估计此模型提供了一种简单快速的方法。