Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain.
Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain.
PLoS Comput Biol. 2019 Aug 2;15(8):e1007246. doi: 10.1371/journal.pcbi.1007246. eCollection 2019 Aug.
Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.
成功预测肿瘤进展的可能路径对于诊断、预后和治疗目的都很有价值。癌症进展模型 (CPM) 使用横截面样本来识别驱动突变积累顺序的限制,因此 CPM 编码了肿瘤进展的路径。在这里,我们分析了四种 CPM 的性能,以检验它们是否可以用于预测肿瘤进展路径的真实分布,并估计进化的不可预测性。通过模拟,我们表明如果适应度景观是单峰的(只有一个适应度最大值),那么当样本量较大时,真实和预测的肿瘤进展路径分布之间有很好的一致性,但在目前常见的小得多的样本量下,性能很差。在多峰适应度景观下(即存在多个适应度最大值),性能很差,随着样本量的增加,性能仅略有提高。在所有情况下,检测模式(当肿瘤被采样时)是性能的关键决定因素。在所检查的四个 CPM 中,对进化不可预测性的估计往往高估了真实的不可预测性,并且偏差受检测模式的影响;CPM 可用于估计真实进化不可预测性的上限。对二十二个癌症数据集的分析表明,其中几个数据集的进化不可预测性较低。但是,肿瘤进展路径的大多数预测都非常不可靠,并且不可靠性随着分析的特征数量的增加而增加。我们的结果表明,CPM 可以成为预测癌症进展的有价值工具,但目前,从 CPM 获得有用的肿瘤进展路径预测是值得怀疑的,这强调了需要进行方法学工作,以解释癌症中可能的多峰适应度景观。