Stjernqvist Susann, Rydén Tobias, Greenman Chris D
Centre for Mathematical Sciences, Lund University, Box 118, 221 00 Lund, Sweden, Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden.
Cancer Inform. 2011;10:159-73. doi: 10.4137/CIN.S6873. Epub 2011 May 25.
SNP allelic copy number data provides intensity measurements for the two different alleles separately. We present a method that estimates the number of copies of each allele at each SNP position, using a continuous-index hidden Markov model. The method is especially suited for cancer data, since it includes the fraction of normal tissue contamination, often present when studying data from cancer tumors, into the model. The continuous-index structure takes into account the distances between the SNPs, and is thereby appropriate also when SNPs are unequally spaced. In a simulation study we show that the method performs favorably compared to previous methods even with as much as 70% normal contamination. We also provide results from applications to clinical data produced using the Affymetrix genome-wide SNP 6.0 platform.
单核苷酸多态性(SNP)等位基因拷贝数数据分别提供了两种不同等位基因的强度测量值。我们提出了一种方法,该方法使用连续索引隐马尔可夫模型来估计每个SNP位置上每个等位基因的拷贝数。该方法特别适用于癌症数据,因为它将正常组织污染的比例(在研究癌症肿瘤数据时经常出现)纳入了模型。连续索引结构考虑了SNP之间的距离,因此在SNP间隔不均等时也适用。在一项模拟研究中,我们表明,即使存在高达70%的正常污染,该方法与以前的方法相比仍表现良好。我们还提供了使用Affymetrix全基因组SNP 6.0平台生成的临床数据的应用结果。