Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, St. Mary's Hospital, London, UK.
Nat Methods. 2010 Jul;7(7):541-6. doi: 10.1038/nmeth.1466. Epub 2010 May 30.
Although genome-wide association studies have uncovered single-nucleotide polymorphisms (SNPs) associated with complex disease, these variants account for a small portion of heritability. Some contribution to this 'missing heritability' may come from copy-number variants (CNVs), in particular rare CNVs; but assessment of this contribution remains challenging because of the difficulty in accurately genotyping CNVs, particularly small variants. We report a population-based approach for the identification of CNVs that integrates data from multiple samples and platforms. Our algorithm, cnvHap, jointly learns a chromosome-wide haplotype model of CNVs and cluster-based models of allele intensity at each probe. Using data for 50 French individuals assayed on four separate platforms, we found that cnvHap correctly detected at least 14% more deleted and 50% more amplified genotypes than PennCNV or QuantiSNP, with an 82% and 115% improvement for aberrations containing <10 probes. Combining data from multiple platforms additionally improved sensitivity.
虽然全基因组关联研究已经发现了与复杂疾病相关的单核苷酸多态性 (SNP),但这些变体仅解释了一小部分遗传率。这种“遗传缺失”的部分原因可能来自于拷贝数变异 (CNV),特别是罕见的 CNV;但由于准确基因分型 CNV 的困难,特别是小型变体,评估这种贡献仍然具有挑战性。我们报告了一种基于人群的方法,用于识别 CNV,该方法整合了来自多个样本和平台的数据。我们的算法 cnvHap 共同学习了 CNV 的全染色体单倍型模型和每个探针的等位基因强度的基于聚类的模型。使用在四个不同平台上对 50 个法国人进行的分析数据,我们发现 cnvHap 比 PennCNV 或 QuantiSNP 正确检测到至少 14%更多的缺失和 50%更多的扩增基因型,对于包含 <10 个探针的异常情况,其准确性提高了 82%和 115%。结合来自多个平台的数据还可以提高检测的灵敏度。