Centre of Excellence in Genomics, International Crops Research Institute for the Semi-Arid Tropics, Patancheru 502324, Andhra Pradesh, India.
Am J Bot. 2012 Feb;99(2):186-92. doi: 10.3732/ajb.1100419. Epub 2012 Feb 1.
Next-generation sequencing (NGS) technologies are frequently used for resequencing and mining of single nucleotide polymorphisms (SNPs) by comparison to a reference genome. In crop species such as chickpea (Cicer arietinum) that lack a reference genome sequence, NGS-based SNP discovery is a challenge. Therefore, unlike probability-based statistical approaches for consensus calling and by comparison with a reference sequence, a coverage-based consensus calling (CbCC) approach was applied and two genotypes were compared for SNP identification.
A CbCC approach is used in this study with four commonly used short read alignment tools (Maq, Bowtie, Novoalign, and SOAP2) and 15.7 and 22.1 million Illumina reads for chickpea genotypes ICC4958 and ICC1882, together with the chickpea trancriptome assembly (CaTA).
A nonredundant set of 4543 SNPs was identified between two chickpea genotypes. Experimental validation of 224 randomly selected SNPs showed superiority of Maq among individual tools, as 50.0% of SNPs predicted by Maq were true SNPs. For combinations of two tools, greatest accuracy (55.7%) was reported for Maq and Bowtie, with a combination of Bowtie, Maq, and Novoalign identifying 61.5% true SNPs. SNP prediction accuracy generally increased with increasing reads depth.
This study provides a benchmark comparison of tools as well as read depths for four commonly used tools for NGS SNP discovery in a crop species without a reference genome sequence. In addition, a large number of SNPs have been identified in chickpea that would be useful for molecular breeding.
下一代测序(NGS)技术常用于通过与参考基因组比较进行重测序和挖掘单核苷酸多态性(SNP)。在缺乏参考基因组序列的作物物种中,如鹰嘴豆(Cicer arietinum),基于 NGS 的 SNP 发现是一个挑战。因此,与基于概率的共识调用统计方法和与参考序列比较不同,本研究应用了基于覆盖度的共识调用(CbCC)方法,并对两种基因型进行了 SNP 鉴定比较。
本研究采用了四种常用的短读序列比对工具(Maq、Bowtie、Novoalign 和 SOAP2),对鹰嘴豆基因型 ICC4958 和 ICC1882 进行了 1570 万和 2210 万次 Illumina 读取,同时使用了鹰嘴豆转录组组装(CaTA)。
在两个鹰嘴豆基因型之间鉴定出了 4543 个非冗余 SNP 。对 224 个随机选择的 SNP 进行了实验验证,结果表明 Maq 在单个工具中具有优势,Maq 预测的 SNP 中 50.0%为真实 SNP 。对于两种工具的组合,Maq 和 Bowtie 的准确率最高(55.7%),而 Bowtie、Maq 和 Novoalign 的组合则能识别 61.5%的真实 SNP 。SNP 预测的准确性通常随读深度的增加而增加。
本研究提供了一个无参考基因组序列的作物物种中四种常用 NGS SNP 发现工具以及读深度的基准比较。此外,在鹰嘴豆中还鉴定出了大量的 SNP ,这对分子育种非常有用。