Group of Molecular Genetics and Systems Biology, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, Blichers Allé 20, DK-8830 Tjele, Denmark.
BMC Genomics. 2011 Nov 14;12:557. doi: 10.1186/1471-2164-12-557.
Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes.
We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays.
Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants.
整合基因组变异与表型信息是揭示基因型-表型关联的有效方法。这需要准确识别个体基因组中不同类型的变异。
我们报告了将荷斯坦-弗里森公牛的全基因组序列与单核苷酸多态性(SNP)和比较基因组杂交(CGH)阵列技术的数据相结合,以确定全面的基因组变异谱。通过将被确定为同源(IBD)或拷贝数变异(CNV)的 SNP 与 SNP 芯片基因型结果相结合,评估了重测序 SNP 检测的性能。编码插入和缺失(indels)在大小上以 3 的倍数富集,并且位于蛋白质的 N 和 C 末端附近。对于较大的 indels,拆分读取和读取对方法的组合在发现不同特征方面是互补的。根据测序读取的深度,以及使用 SNP 和 CGH 阵列来识别 CNV。
我们的结果提供了个体牛基因组中多种基因组变异类别的高分辨率图谱,并表明结构变异超过序列变异成为基因组变异性的主要组成部分。当使用实现映射质量的算法时,检测 SNP 的准确性更高,而敏感性损失很小。IBD 区域对于计算重测序 SNP 准确性非常重要,而 CNV 内的 SNP 检测往往不太可靠。CNV 发现受到平台分辨率和覆盖偏差的显著影响。本研究的综合数据表明,在中等测序覆盖水平下,可以同时应用一组平台和工具来最大限度地准确检测序列和结构变异。