SEEDERS Inc., Daejeon 305-509, Korea.
Mol Cells. 2014 Jan;37(1):36-42. doi: 10.14348/molcells.2014.2241. Epub 2014 Jan 27.
The tomato (Solanum lycopersicum L.) is a model plant for genome research in Solanaceae, as well as for studying crop breeding. Genome-wide single nucleotide polymorphisms (SNPs) are a valuable resource in genetic research and breeding. However, to do discovery of genome-wide SNPs, most methods require expensive high-depth sequencing. Here, we describe a method for SNP calling using a modified version of SAMtools that improved its sensitivity. We analyzed 90 Gb of raw sequence data from next-generation sequencing of two resequencing and seven transcriptome data sets from several tomato accessions. Our study identified 4,812,432 non-redundant SNPs. Moreover, the workflow of SNP calling was improved by aligning the reference genome with its own raw data. Using this approach, 131,785 SNPs were discovered from transcriptome data of seven accessions. In addition, 4,680,647 SNPs were identified from the genome of S. pimpinellifolium, which are 60 times more than 71,637 of the PI212816 transcriptome. SNP distribution was compared between the whole genome and transcriptome of S. pimpinellifolium. Moreover, we surveyed the location of SNPs within genic and intergenic regions. Our results indicated that the sufficient genome-wide SNP markers and very sensitive SNP calling method allow for application of marker assisted breeding and genome-wide association studies.
番茄(Solanum lycopersicum L.)是茄科植物基因组研究和作物育种的模式植物。全基因组单核苷酸多态性(SNP)是遗传研究和育种的宝贵资源。然而,要发现全基因组 SNP,大多数方法都需要昂贵的高深度测序。在这里,我们描述了一种使用改进版 SAMtools 进行 SNP 调用的方法,该方法提高了其灵敏度。我们分析了来自两个重测序和七个番茄品系转录组数据的新一代测序的 90 Gb 原始序列数据。我们的研究鉴定了 4,812,432 个非冗余 SNP。此外,通过将参考基因组与其原始数据对齐,改进了 SNP 调用的工作流程。使用这种方法,从七个品系的转录组数据中发现了 131,785 个 SNP。此外,从 S. pimpinellifolium 的基因组中鉴定出 4,680,647 个 SNP,是 PI212816 转录组中 SNP 的 60 倍。在 S. pimpinellifolium 的全基因组和转录组之间比较了 SNP 的分布。此外,我们还调查了 SNP 在基因和基因间区域内的位置。我们的结果表明,充足的全基因组 SNP 标记和非常敏感的 SNP 调用方法可用于标记辅助育种和全基因组关联研究。