Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania.
Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute, Indiana.
J Comput Biol. 2020 Apr;27(4):565-598. doi: 10.1089/cmb.2019.0302. Epub 2020 Mar 16.
Characterizing intratumor heterogeneity (ITH) is crucial to understanding cancer development, but it is hampered by limits of available data sources. Bulk DNA sequencing is the most common technology to assess ITH, but involves the analysis of a mixture of many genetically distinct cells in each sample, which must then be computationally deconvolved. Single-cell sequencing is a promising alternative, but its limitations-for example, high noise, difficulty scaling to large populations, technical artifacts, and large data sets-have so far made it impractical for studying cohorts of sufficient size to identify statistically robust features of tumor evolution. We have developed strategies for deconvolution and tumor phylogenetics combining limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost, with specific focus on deconvolving genomic copy number data. We developed a mixed membership model for clonal deconvolution via non-negative matrix factorization balancing deconvolution quality with similarity to single-cell samples via an associated efficient coordinate descent algorithm. We then improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using a mixed integer linear programming model to incorporate a minimum evolution phylogenetic tree cost in the problem objective. We demonstrate the effectiveness of these methods on semisimulated data of known ground truth, showing improved deconvolution accuracy relative to bulk data alone.
对肿瘤内异质性(ITH)进行特征描述对于理解癌症的发展至关重要,但这受到可用数据源的限制。 bulk DNA 测序是评估 ITH 的最常用技术,但涉及对每个样本中许多遗传上不同的细胞混合物进行分析,然后必须通过计算进行解卷积。单细胞测序是一种很有前途的替代方法,但它的局限性——例如,高噪声、难以扩展到大群体、技术伪影和大数据集——迄今为止,对于研究足够大的队列以确定肿瘤进化的统计稳健特征来说是不切实际的。我们已经开发了一些结合有限数量的 bulk 和单细胞数据的去卷积和肿瘤系统发生学策略,以获得一些单细胞分辨率的优势,同时成本要低得多,特别关注基因组拷贝数数据的去卷积。我们通过非负矩阵分解开发了一种用于克隆去卷积的混合成员模型,通过相关的高效坐标下降算法,通过与单细胞样本的相似性来平衡去卷积质量。然后,我们通过将去卷积与克隆系统发生推断相结合来改进该算法,使用混合整数线性规划模型将最小进化系统发生树成本纳入问题目标中。我们在已知真实情况的半模拟数据上证明了这些方法的有效性,与仅使用 bulk 数据相比,提高了去卷积的准确性。