Crop Improvement and Genetics Research Unit, Western Regional Research Center, U.S. Department of Agriculture-Agricultural Research Service, Albany, CA 94710, USA.
Department of Genomics and Genome Editing, Montana BioAgriculture Inc., Missoula, MT 59802, USA.
Int J Mol Sci. 2021 Sep 27;22(19):10400. doi: 10.3390/ijms221910400.
The highly challenging hexaploid wheat () genome is becoming ever more accessible due to the continued development of multiple reference genomes, a factor which aids in the plight to better understand variation in important traits. Although the process of variant calling is relatively straightforward, selection of the best combination of the computational tools for read alignment and variant calling stages of the analysis and efficient filtering of the false variant calls are not always easy tasks. Previous studies have analyzed the impact of methods on the quality metrics in diploid organisms. Given that variant identification in wheat largely relies on accurate mining of exome data, there is a critical need to better understand how different methods affect the analysis of whole exome sequencing (WES) data in polyploid species. This study aims to address this by performing whole exome sequencing of 48 wheat cultivars and assessing the performance of various variant calling pipelines at their suggested settings. The results show that all the pipelines require filtering to eliminate false-positive calls. The high consensus among the reference SNPs called by the best-performing pipelines suggests that filtering provides accurate and reproducible results. This study also provides detailed comparisons for high sensitivity and precision at individual and population levels for the raw and filtered SNP calls.
由于多个参考基因组的不断发展,高度具有挑战性的六倍体小麦基因组变得越来越容易获取,这有助于更好地了解重要性状变异的困境。尽管变异调用的过程相对简单,但选择用于读段比对和变异调用阶段的最佳计算工具组合以及有效地过滤虚假变异调用并不总是一件容易的任务。以前的研究分析了方法对二倍体生物的质量指标的影响。鉴于小麦中的变异识别在很大程度上依赖于外显子数据的准确挖掘,因此迫切需要更好地了解不同方法如何影响多倍体物种的全外显子组测序 (WES) 数据的分析。本研究旨在通过对 48 个小麦品种进行全外显子组测序,并在建议的设置下评估各种变异调用管道的性能来解决这个问题。结果表明,所有管道都需要过滤以消除假阳性调用。表现最佳的管道调用的参考 SNP 之间的高度一致性表明过滤提供了准确和可重复的结果。本研究还提供了在个体和群体水平上对原始和过滤 SNP 调用的高灵敏度和高精度的详细比较。