Roman Theodore, Xie Lu, Schwartz Russell
Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):6. doi: 10.1186/s12864-015-2302-x.
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.
尽管癌症具有巨大的医学影响,且对其生物学特性进行了深入研究,但肿瘤生长和发展的详细特征仍难以捉摸。造成这种困难的很大一部分原因是癌症进展的分子机制存在巨大异质性,无论是肿瘤之间还是单个肿瘤内的细胞之间。基因组技术的进步,尤其是在单细胞水平上的进步,正在改善这种情况,但这些方法受到从异质细胞群体收集基因组数据的生物技术以及理解这些数据的计算方法的限制。一种在不承担单细胞基因组学成本的情况下获得全基因组方法优势的常用方法是使用计算反卷积(解混)方法从大量基因组数据中重建克隆异质性。这些方法也受到从大量数据推断稀有或细微变化的克隆亚群的基因组图谱的困难的限制,这个问题在计算上可以简化为在基因组空间中重建肿瘤样本点云的几何形状的问题。在这里,我们提出了一种新方法,通过更好地识别与由克隆亚群的不同组合混合物产生的肿瘤相对应的子空间来改进这种重建。我们开发了一种基于类中心偏移聚类的非参数聚类方法,用于识别预期与进化进展的不同轨迹相对应的肿瘤亚组。我们在合成和真实肿瘤拷贝数数据上表明,这种新方法大大提高了我们解析离散肿瘤亚组的能力,这是准确反卷积肿瘤基因组数据并从大量数据推断克隆异质性过程中的关键一步。