Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Dr, Rockville MD 20850 USA.
Biostatistics. 2022 Jan 13;23(1):69-82. doi: 10.1093/biostatistics/kxaa013.
Allele-specific copy number alteration (ASCNA) analysis is for identifying copy number abnormalities in tumor cells. Unlike normal cells, tumor cells are heterogeneous as a combination of dominant and minor subclones with distinct copy number profiles. Estimating the clonal proportion and identifying mainclone and subclone genotypes across the genome are important for understanding tumor progression. Several ASCNA tools have recently been developed, but they have been limited to the identification of subclone regions, and not the genotype of subclones. In this article, we propose subHMM, a hidden Markov model-based approach that estimates both subclone region and region-specific subclone genotype and clonal proportion. We specify a hidden state variable representing the conglomeration of clonal genotype and subclone status. We propose a two-step algorithm for parameter estimation, where in the first step, a standard hidden Markov model with this conglomerated state variable is fit. Then, in the second step, region-specific estimates of the clonal proportions are obtained by maximizing region-specific pseudo-likelihoods. We apply subHMM to study renal cell carcinoma datasets in The Cancer Genome Atlas. In addition, we conduct simulation studies that show the good performance of the proposed approach. The R source code is available online at https://dceg.cancer.gov/tools/analysis/subhmm. Expectation-Maximization algorithm; Forward-backward algorithm; Somatic copy number alteration; Tumor subclones.
等位基因特异性拷贝数改变 (ASCNA) 分析用于识别肿瘤细胞中的拷贝数异常。与正常细胞不同,肿瘤细胞是异质性的,由具有不同拷贝数特征的优势亚克隆和次要亚克隆组成。估计克隆比例并识别整个基因组中的主克隆和亚克隆基因型对于了解肿瘤进展非常重要。最近已经开发了几种 ASCNA 工具,但它们仅限于识别亚克隆区域,而不是亚克隆的基因型。在本文中,我们提出了 subHMM,这是一种基于隐马尔可夫模型的方法,可估计亚克隆区域和区域特异性亚克隆基因型和克隆比例。我们指定一个隐藏状态变量来表示克隆基因型和亚克隆状态的组合。我们提出了一种两步参数估计算法,其中在第一步中,拟合具有此组合状态变量的标准隐马尔可夫模型。然后,在第二步中,通过最大化区域特异性似然来获得区域特异性的克隆比例估计值。我们将 subHMM 应用于研究 TCGA 中的肾细胞癌数据集。此外,我们进行了模拟研究,表明了所提出方法的良好性能。R 源代码可在 https://dceg.cancer.gov/tools/analysis/subhmm 上获得。期望最大化算法;前向-后向算法;体细胞拷贝数改变;肿瘤亚克隆。