Wang Kai, Chen Zhen, Tadesse Mahlet G, Glessner Joseph, Grant Struan F A, Hakonarson Hakon, Bucan Maja, Li Mingyao
Department of Genetics, Division of Human Genetics, Center for Applied Genomics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA.
Nucleic Acids Res. 2008 Dec;36(21):e138. doi: 10.1093/nar/gkn641. Epub 2008 Oct 2.
Copy number variations (CNVs) are being used as genetic markers or functional candidates in gene-mapping studies. However, unlike single nucleotide polymorphism or microsatellite genotyping techniques, most CNV detection methods are limited to detecting total copy numbers, rather than copy number in each of the two homologous chromosomes. To address this issue, we developed a statistical framework for intensity-based CNV detection platforms using family data. Our algorithm identifies CNVs for a family simultaneously, thus avoiding the generation of calls with Mendelian inconsistency while maintaining the ability to detect de novo CNVs. Applications to simulated data and real data indicate that our method significantly improves both call rates and accuracy of boundary inference, compared to existing approaches. We further illustrate the use of Mendelian inheritance to infer SNP allele compositions in each of the two homologous chromosomes in CNV regions using real data. Finally, we applied our method to a set of families genotyped using both the Illumina HumanHap550 and Affymetrix genome-wide 5.0 arrays to demonstrate its performance on both inherited and de novo CNVs. In conclusion, our method produces accurate CNV calls, gives probabilistic estimates of CNV transmission and builds a solid foundation for the development of linkage and association tests utilizing CNVs.
拷贝数变异(CNV)正被用作基因定位研究中的遗传标记或功能候选物。然而,与单核苷酸多态性或微卫星基因分型技术不同,大多数CNV检测方法仅限于检测总拷贝数,而不是两条同源染色体中每条染色体的拷贝数。为了解决这个问题,我们开发了一种基于强度的CNV检测平台的统计框架,该框架使用家系数据。我们的算法同时识别一个家系中的CNV,从而避免产生孟德尔不一致的结果,同时保持检测新生CNV的能力。对模拟数据和真实数据的应用表明,与现有方法相比,我们的方法显著提高了检出率和边界推断的准确性。我们进一步利用真实数据说明了如何使用孟德尔遗传来推断CNV区域中两条同源染色体中每条染色体上的SNP等位基因组成。最后,我们将我们的方法应用于一组使用Illumina HumanHap550和Affymetrix全基因组5.0芯片进行基因分型的家系,以证明其在遗传和新生CNV上的性能。总之,我们的方法能够准确地检测CNV,给出CNV传递的概率估计,并为利用CNV进行连锁和关联测试的开发奠定了坚实的基础。