Graduate Group in Genomics and Computational Biology.
Department of Pathology and Laboratory Medicine, Perelman School of Medicine.
Bioinformatics. 2018 Jul 15;34(14):2349-2355. doi: 10.1093/bioinformatics/bty104.
Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.
We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.
https://github.com/zhouzilu/iCNV.
Supplementary data are available at Bioinformatics online.
拷贝数变异(CNVs)是 DNA 片段的增益和缺失,与疾病有关。许多大规模的遗传关联研究正在使用全外显子测序(WES)和全基因组测序(WGS)进行 CNV 分析。在这些研究中,许多都有先前的单核苷酸多态性(SNP)-芯片数据可用。预计综合跨平台分析将提高分辨率和准确性,但目前还没有有效结合测序和芯片平台数据的工具。仅使用测序数据检测 CNVs 也可以通过利用等位基因特异性读取来进一步提高。
我们提出了一个统计框架,即集成 CNV(iCNV)检测算法,该算法可应用于多种研究设计:仅 WES、仅 WGS、仅 SNP 芯片,或 SNP 和测序数据的任意组合。iCNV 应用特定于平台的归一化方法,利用测序中的等位基因特异性读取,并通过隐马尔可夫模型整合匹配的 NGS 和 SNP-array 数据。我们将使用 iCNV 进行的集成双平台 CNV 检测与平台的盲目交集或并集进行比较,结果表明 iCNV 提高了灵敏度和稳健性。我们还评估了仅使用 WGS 数据的 iCNV 的准确性,并表明与现有方法相比,利用等位基因特异性读取可提高 CNV 检测的准确性。
https://github.com/zhouzilu/iCNV。
补充数据可在生物信息学在线获得。