Lee Gil Ya Cancer and Diabetes Institute, Gachon University, Incheon, Korea.
PLoS One. 2013 Apr 8;8(4):e60585. doi: 10.1371/journal.pone.0060585. Print 2013.
For the robust practice of genomic medicine, sequencing results must be compatible, regardless of the sequencing technologies and algorithms used. Presently, genome sequencing is still an imprecise science and is complicated by differences in the chemistry, coverage, alignment, and variant-calling algorithms. We identified 3.33 million single nucleotide variants (SNVs) and ~3.62 million SNVs in the SJK genome using SOLiD and Illumina data, respectively. Approximately 3 million SNVs were concordant between the two platforms while 68,532 SNVs were discordant; 219,616 SNVs were SOLiD-specific and 516,080 SNVs were Illumina-specific (i.e., platform-specific). Concordant, discordant, and platform-specific SNVs were further analyzed and characterized. Overall, a large portion of heterozygous SNVs that were discordant with genotyping calls of single nucleotide polymorphism chips were highly confident. Approximately 70% of the platform-specific SNVs were located in regions containing repetitive sequences. Such platform-specificity may arise from differences between platforms, with regard to read length (36 bp and 72 bp vs. 50 bp), insert size (100-300 bp vs. ~1-2 kb), sequencing chemistry (sequencing-by-synthesis using single nucleotides vs. ligation-based sequencing using oligomers), and sequencing quality. When data from the two platforms were merged for variant calling, the proportion of callable regions of the reference genome increased to 99.66%, which was 1.43% higher than the average callability of the two platforms, representing ~40 million bases. In this study, we compared the differences in sequencing results between two sequencing platforms. Approximately 90% of the SNVs were concordant between the two platforms, yet ~10% of the SNVs were either discordant or platform-specific, indicating that each platform had its own strengths and weaknesses. When data from the two platforms were merged, both the overall callability of the reference genome and the overall accuracy of the SNVs improved, demonstrating the likelihood that a re-sequenced genome can be revised using complementary data.
为了实现基因组医学的稳健实践,无论使用何种测序技术和算法,测序结果都必须具有一致性。目前,基因组测序仍然是一门不精确的科学,并且由于化学、覆盖度、比对和变异调用算法的差异而变得复杂。我们分别使用 SOLiD 和 Illumina 数据在 SJK 基因组中鉴定出约 333 万个单核苷酸变异(SNV)和约 362 万个 SNV。两个平台之间约有 300 万个 SNV 是一致的,而有 68532 个 SNV 是不一致的;219616 个 SNV 是 SOLiD 特异性的,而 516080 个 SNV 是 Illumina 特异性的(即平台特异性的)。我们进一步分析和描述了一致的、不一致的和平台特异性的 SNV。总的来说,大部分与单核苷酸多态性芯片的基因分型调用不一致的杂合性 SNV 具有高度可信度。大约 70%的平台特异性 SNV 位于含有重复序列的区域。这种平台特异性可能源于平台之间的差异,包括读长(36 个碱基对和 72 个碱基对与 50 个碱基对)、插入大小(100-300 个碱基对与1-2 kb)、测序化学(使用单核苷酸的合成测序与使用寡核苷酸的连接测序)和测序质量。当将两个平台的数据合并进行变异调用时,参考基因组的可调用区域比例增加到 99.66%,比两个平台的平均可调用性高 1.43%,代表约 4000 万个碱基。在这项研究中,我们比较了两个测序平台的测序结果差异。两个平台之间约 90%的 SNV 是一致的,但约 10%的 SNV 是不一致的或平台特异性的,这表明每个平台都有其自身的优势和劣势。当将两个平台的数据合并时,参考基因组的整体可调用性和 SNV 的整体准确性都得到了提高,这表明使用互补数据可以对重新测序的基因组进行修订。