Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada.
Genome Biol. 2010;11(5):R52. doi: 10.1186/gb-2010-11-5-r52. Epub 2010 May 19.
Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.
We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.
Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.
目前已经对多个基因组进行了测序,并对其中数百万个遗传变异进行了注释。虽然在单核苷酸多态性 (SNP) 和小 (<10bp) 插入/缺失 (indel) 的定位方面取得了重大进展,但对更大结构变异的注释却不够全面。目前仍不清楚典型基因组与参考组装之间的差异程度,而且对迄今为止测序的基因组进行分析显示,拷贝数变异 (CNV) 和倒位的结果存在差异。
我们结合了对现有全基因组序列数据的计算重新分析和新的基于微阵列的分析,检测到 12178 个结构变异,覆盖了 40.6Mb,这些变异在首次发表的个人基因组的初始测序中没有报告。我们估计单个基因组中非 SNP 变异的总含量为 48.8Mb。我们的结果表明,考虑到插入缺失/CNV,该基因组与共识参考序列的差异约为 1.2%,SNP 差异约为 0.1%,倒位差异约为 0.3%。结构变异影响了 4867 个基因,超过 24%的结构变异无法通过 SNP 关联进行推断。
我们的结果表明,到目前为止,已发表的个体基因组中大量结构变异尚未报告。这些结构变异的数量大且复杂,以及它们在医学上的相关性日益受到认识,需要在个人基因组的健康相关分析中积极研究这些结构变异。为该基因组生成的新结构变异目录为未来的比较研究提供了关键资源。