Lin Yen-Jen, Chen Yu-Tin, Hsu Shu-Ni, Peng Chien-Hua, Tang Chuan-Yi, Yen Tzu-Chen, Hsieh Wen-Ping
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan.
PLoS One. 2014 May 21;9(5):e96841. doi: 10.1371/journal.pone.0096841. eCollection 2014.
Copy number variation (CNV) has been reported to be associated with disease and various cancers. Hence, identifying the accurate position and the type of CNV is currently a critical issue. There are many tools targeting on detecting CNV regions, constructing haplotype phases on CNV regions, or estimating the numerical copy numbers. However, none of them can do all of the three tasks at the same time. This paper presents a method based on Hidden Markov Model to detect parent specific copy number change on both chromosomes with signals from SNP arrays. A haplotype tree is constructed with dynamic branch merging to model the transition of the copy number status of the two alleles assessed at each SNP locus. The emission models are constructed for the genotypes formed with the two haplotypes. The proposed method can provide the segmentation points of the CNV regions as well as the haplotype phasing for the allelic status on each chromosome. The estimated copy numbers are provided as fractional numbers, which can accommodate the somatic mutation in cancer specimens that usually consist of heterogeneous cell populations. The algorithm is evaluated on simulated data and the previously published regions of CNV of the 270 HapMap individuals. The results were compared with five popular methods: PennCNV, genoCN, COKGEN, QuantiSNP and cnvHap. The application on oral cancer samples demonstrates how the proposed method can facilitate clinical association studies. The proposed algorithm exhibits comparable sensitivity of the CNV regions to the best algorithm in our genome-wide study and demonstrates the highest detection rate in SNP dense regions. In addition, we provide better haplotype phasing accuracy than similar approaches. The clinical association carried out with our fractional estimate of copy numbers in the cancer samples provides better detection power than that with integer copy number states.
据报道,拷贝数变异(CNV)与疾病及各种癌症相关。因此,确定CNV的准确位置和类型是当前的关键问题。有许多工具旨在检测CNV区域、构建CNV区域的单倍型相位或估计拷贝数的数值。然而,它们中没有一个能同时完成这三项任务。本文提出了一种基于隐马尔可夫模型的方法,利用SNP阵列的信号检测两条染色体上亲本特异性的拷贝数变化。通过动态分支合并构建单倍型树,以模拟在每个SNP位点评估的两个等位基因拷贝数状态的转变。为两个单倍型形成的基因型构建发射模型。所提出的方法可以提供CNV区域的分割点以及每条染色体上等位基因状态的单倍型相位。估计的拷贝数以分数形式提供,这可以适应癌症标本中通常由异质细胞群体组成的体细胞突变。该算法在模拟数据和先前发表的270个HapMap个体的CNV区域上进行了评估。将结果与五种常用方法进行了比较:PennCNV、genoCN、COKGEN、QuantiSNP和cnvHap。在口腔癌样本上的应用展示了所提出的方法如何促进临床关联研究。在我们全基因组研究中,所提出的算法对CNV区域表现出与最佳算法相当的敏感性,并在SNP密集区域显示出最高的检测率。此外,我们提供了比类似方法更好的单倍型相位准确性。在癌症样本中使用我们对拷贝数的分数估计进行临床关联研究,比使用整数拷贝数状态提供了更好的检测能力。