Rheumatology Research Group, Vall d'Hebron Hospital Research Institute, Barcelona, Spain.
PLoS One. 2013 Jul 3;8(7):e68822. doi: 10.1371/journal.pone.0068822. Print 2013.
We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method.
我们提出了 GStream 方法,该方法将 Illumina 微阵列平台上的全基因组 SNP 和 CNV 基因分型与前所未有的准确性相结合。这种新方法优于以前成熟的 SNP 基因分型软件。更重要的是,GStream 的 CNV 调用算法极大地改进了以前最先进方法的结果,并获得了与纯 CNV 导向技术(如比较基因组杂交 (CGH))相当的准确性。我们使用 HapMap 样本生成的微阵列数据来证明 GStream 的优越性能。使用 1000 基因组计划 (1KGP) 生成的参考 CNV 调用和基于 CGH 或基因分型微阵列技术的全基因组 CNV 特征的知名研究,我们表明与以前开发的方法相比,GStream 可以将可靠检测到的变体数量增加多达 25%。此外,GStream 提供的增加的基因组覆盖范围允许发现与 SNP 紧密连锁不平衡的 CNV,这些 SNP 先前与已发表的全基因组关联研究 (GWAS) 中的疾病风险相关。这些结果可以为检测到的疾病风险关联的生物学机制提供重要的见解。有了 GStream,大规模的 GWAS 将不仅受益于 SNP 和 CNV 的组合基因分型前所未有的准确性,还将受益于该方法的计算效率。