Lai Tze Leung, Xing Haipeng, Zhang Nancy
Department of Statistics and Cancer Center, Stanford University, Stanford, CA 94305-4065, USA.
Biostatistics. 2008 Apr;9(2):290-307. doi: 10.1093/biostatistics/kxm031. Epub 2007 Sep 12.
Array-based comparative genomic hybridization (array-CGH) is a high throughput, high resolution technique for studying the genetics of cancer. Analysis of array-CGH data typically involves estimation of the underlying chromosome copy numbers from the log fluorescence ratios and segmenting the chromosome into regions with the same copy number at each location. We propose for the analysis of array-CGH data, a new stochastic segmentation model and an associated estimation procedure that has attractive statistical and computational properties. An important benefit of this Bayesian segmentation model is that it yields explicit formulas for posterior means, which can be used to estimate the signal directly without performing segmentation. Other quantities relating to the posterior distribution that are useful for providing confidence assessments of any given segmentation can also be estimated by using our method. We propose an approximation method whose computation time is linear in sequence length which makes our method practically applicable to the new higher density arrays. Simulation studies and applications to real array-CGH data illustrate the advantages of the proposed approach.
基于微阵列的比较基因组杂交技术(array-CGH)是一种用于研究癌症遗传学的高通量、高分辨率技术。array-CGH数据分析通常包括从对数荧光比率估计潜在的染色体拷贝数,并将染色体分割成每个位置具有相同拷贝数的区域。我们提出了一种用于array-CGH数据分析的新的随机分割模型和相关的估计程序,该模型具有吸引人的统计和计算特性。这种贝叶斯分割模型的一个重要优点是它产生了后验均值的显式公式,可用于直接估计信号而无需进行分割。通过使用我们的方法,还可以估计与后验分布相关的其他量,这些量对于提供任何给定分割的置信度评估很有用。我们提出了一种近似方法,其计算时间与序列长度成线性关系,这使得我们的方法在实际中适用于新的更高密度阵列。模拟研究和对实际array-CGH数据的应用说明了所提出方法的优点。