Lee Juhee, Müller Peter, Sengupta Subhajit, Gulukota Kamalakar, Ji Yuan
Department of Applied Mathematics and Statistics, University of California Santa Cruz.
Department of Mathematics, University of Texas Austin.
J R Stat Soc Ser C Appl Stat. 2016 Aug;65(4):547-563. doi: 10.1111/rssc.12136. Epub 2016 Jan 12.
Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise determination of copy numbers, we develop a Bayesian feature allocation model for jointly calling subclonal copy numbers and the corresponding allele sequences for the same loci. The proposed method utilizes three random matrices, , and to represent subclonal copy numbers ( ), numbers of subclonal variant alleles ( ) and cellular fractions of subclones in samples ( ), respectively. The unknown number of subclones implies a random number of columns for these matrices. We use next-generation sequencing data to estimate the subclonal structures through inference on these three matrices. Using simulation studies and a real data analysis, we demonstrate how posterior inference on the subclonal structure is enhanced with the joint modeling of both structure and sequencing variants on subclonal genomes. Software is available at http://compgenome.org/BayClone2.
肿瘤样本具有异质性。它们由不同的亚克隆组成,这些亚克隆的特征是多个位点的DNA核苷酸序列和拷贝数存在差异。异质性可以通过识别一组选定位点的亚克隆拷贝数和序列来衡量。由于认识到变异等位基因分数的准确识别很大程度上依赖于拷贝数的精确测定,我们开发了一种贝叶斯特征分配模型,用于联合调用同一基因座的亚克隆拷贝数和相应的等位基因序列。所提出的方法利用三个随机矩阵 、 和 分别表示亚克隆拷贝数( )、亚克隆变异等位基因数( )和样本中亚克隆的细胞分数( )。亚克隆数量未知意味着这些矩阵的列数是随机的。我们使用下一代测序数据通过对这三个矩阵的推断来估计亚克隆结构。通过模拟研究和实际数据分析,我们展示了如何通过对亚克隆基因组上的结构和测序变异进行联合建模来增强对亚克隆结构的后验推断。软件可在http://compgenome.org/BayClone2获取。