Desper R, Jiang F, Kallioniemi O P, Moch H, Papadimitriou C H, Schäffer A A
Deutsches Krebsforschungzentrum, Abt. Theoretische Bioinformatik, Heidelberg, Germany.
J Comput Biol. 2000;7(6):789-803. doi: 10.1089/10665270050514936.
Comparative genomic hybridization (CGH) is a laboratory method to measure gains and losses in the copy number of chromosomal regions in tumor cells. It is hypothesized that certain DNA gains and losses are related to cancer progression and that the patterns of these changes are relevant to the clinical consequences of the cancer. It is therefore of interest to develop models which predict the occurrence of these events, as well as techniques for learning such models from CGH data. We continue our study of the mathematical foundations for inferring a model of tumor progression from a CGH data set that we started in Desper et al. (1999). In that paper, we proposed a class of probabilistic tree models and showed that an algorithm based on maximum-weight branching in a graph correctly infers the topology of the tree, under plausible assumptions. In this paper, we extend that work in the direction of the so-called distance-based trees, in which events are leaves of the tree, in the style of models common in phylogenetics. Then we show how to reconstruct the distance-based trees using tree-fitting algorithms developed by researchers in phylogenetics. The main advantages of the distance-based models are that 1) they represent information about co-occurrences of all pairs of events, instead of just some pairs, 2) they allow quantitative predictions about which events occur early in tumor progression, and 3) they bring into play the extensive methodology and software developed in the context of phylogenetics. We illustrate the distance-based tree method and how it complements the branching tree method, with a CGH data set for renal cancer.
比较基因组杂交(CGH)是一种实验室方法,用于测量肿瘤细胞中染色体区域拷贝数的增加和减少。据推测,某些DNA的增加和减少与癌症进展相关,并且这些变化的模式与癌症的临床后果相关。因此,开发能够预测这些事件发生的模型以及从CGH数据中学习此类模型的技术具有重要意义。我们继续研究从CGH数据集中推断肿瘤进展模型的数学基础,这一研究始于Desper等人(1999年)。在那篇论文中,我们提出了一类概率树模型,并表明在合理假设下,基于图中最大权重分支的算法能够正确推断树的拓扑结构。在本文中,我们朝着所谓的基于距离的树的方向扩展了这项工作,在这种树中,事件是树的叶子,其风格类似于系统发育学中常见的模型。然后我们展示了如何使用系统发育学研究人员开发的树拟合算法来重建基于距离的树。基于距离的模型的主要优点是:1)它们表示所有事件对同时出现的信息,而不仅仅是某些对;2)它们允许对哪些事件在肿瘤进展早期发生进行定量预测;3)它们引入了在系统发育学背景下开发的广泛方法和软件。我们用一个肾癌的CGH数据集说明了基于距离的树方法及其如何补充分支树方法。