Department of Plant Sciences, University of California, Davis, CA 95616, USA.
BMC Genomics. 2012 Jul 31;13:354. doi: 10.1186/1471-2164-13-354.
A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.).
The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar 'Chandler' were mapped to 48,661 'Chandler' bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to genotype a walnut mapping population having 'Chandler' as one of the parents. Genotyping results were used to adjust the filtering parameters of the updated AGSNP pipeline. With the adjusted filtering criteria, 69.6% of SNPs discovered with the updated pipeline were real and could be mapped on the walnut genetic map. A total of 13,439 SNPs were discovered by BES re-sequencing. BESs harboring SNPs were in 677 FPC contigs covering 98% of the physical map of the walnut genome.
The updated AGSNP pipeline is a versatile SNP discovery tool for a high-throughput, genome-wide SNP discovery in both autogamous and allogamous species. With this pipeline, a large set of SNPs were identified in a single walnut cultivar.
全基因组单核苷酸多态性(SNP)是遗传研究和育种的宝贵资源,通常通过重新测序基因组来开发。如果没有完整的基因组序列,则必须使用替代策略。我们之前报道了一种用于在自花授粉(自交)植物中在编码序列和其他单拷贝 DNA 中发现全基因组 SNP 的管道(AGSNP)的开发。在这里,我们更新了该管道,用于在异花授粉(异交)物种中发现 SNP,并在核桃(Juglans regia L.)中证明了其在 SNP 发现方面的功效。
AGSNP 管道原始实施的第一步是构建参考序列并识别其中的单拷贝序列。为了识别单拷贝序列,将另一个个体的多个基因组当量的短 SOLiD 读数映射到构成参考序列的浅基因组覆盖度的长 Sanger 或 Roche 454 读数上。SOLiD 读数的相对深度用于从参考序列中的单拷贝序列中过滤重复序列。第二步是在 SOLiD 读数和参考序列之间搜索 SNP。映射的 SOLiD 读数中的多态性将排除 SNP 发现;因此,两个个体都必须是纯合的。AGSNP 管道在此处更新,用于在这两个主要步骤中使用异质个体的 SOLiD 或其他类型的短读数。总共 32.6X 核桃基因组当量的 SOLiD 读数被映射到在核桃物理图谱构建过程中通过 Sanger 测序产生的 48,661 个“钱德勒”细菌人工染色体(BAC)末端序列(BES)。最初鉴定了 22,799 个假定的 SNP。总共 6,000 个均匀分布在核桃物理图谱上的 Infinium II 型 SNP 被选中用于构建 Infinium BeadChip,该芯片用于对一个具有“钱德勒”作为亲本之一的核桃作图群体进行基因分型。基因分型结果用于调整更新的 AGSNP 管道的过滤参数。使用调整后的过滤标准,通过更新的管道发现的 69.6%的 SNP 是真实的,可以映射到核桃遗传图谱上。通过 BES 重测序发现了总共 13,439 个 SNP。含有 SNP 的 BES 位于 677 个 FPC 重叠群中,覆盖了核桃基因组物理图谱的 98%。
更新的 AGSNP 管道是一种用于自花授粉和异花授粉物种高通量全基因组 SNP 发现的通用 SNP 发现工具。使用该管道,在单个核桃品种中鉴定出了大量 SNP。