Sun Wei, Wright Fred A, Tang Zhengzheng, Nordgard Silje H, Van Loo Peter, Yu Tianwei, Kristensen Vessela N, Perou Charles M
Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
Nucleic Acids Res. 2009 Sep;37(16):5365-77. doi: 10.1093/nar/gkp493. Epub 2009 Jul 6.
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
我们提出了一个名为genoCN的统计框架,用于使用高密度SNP(单核苷酸多态性)阵列同时剖析拷贝数状态和基因型。基因组DNA拷贝数差异至少有两种类型:拷贝数变异(CNV)和拷贝数畸变(CNA)。虽然CNV是自然发生且可遗传的,但CNA是获得性体细胞改变,最常在肿瘤组织中观察到。与CNA相比,CNV往往较短且在基因组中分布更稀疏。GenoCN由两个组件组成,即genoCNV和genoCNA,分别用于CNV和CNA研究。与大多数现有方法不同,genoCN更灵活,因为模型参数是从数据中估计出来的,而不是事先确定的。GenoCNA还纳入了两种用于CNA研究的重要策略。首先,明确模拟组织污染的影响。其次,如果对一个个体的肿瘤组织和正常组织都进行了SNP阵列检测,则使用正常组织的基因型分型结果来研究肿瘤组织中的CNA。我们通过将其应用于162个HapMap个体和一个脑肿瘤(胶质母细胞瘤)数据集对genoCN进行了评估,结果表明我们的方法能够成功识别这两种类型的拷贝数差异,并产生高质量的基因型分型结果。