State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.
Genetalks Biotech. Co., Ltd, Changsha, 410000, Hunan, China.
Sci Rep. 2020 Oct 30;10(1):18712. doi: 10.1038/s41598-020-74526-7.
There is generally one standard reference sequence for each species. When extensive variations exist in other breeds of the species, it can lead to ambiguous alignment and inaccurate variant calling and, in turn, compromise the accuracy of downstream analysis. Here, with the help of the FPGA hardware platform, we present a method that generates an alternative reference via an iterative strategy to improve the read alignment for breeds that are genetically distant to the reference breed. Compared to the published reference genomes, by using the alternative reference sequences we built, the mapping rates of Chinese indigenous pigs and chickens were improved by 0.61-1.68% and 0.09-0.45%, respectively. These sequences also enable researchers to recover highly variable regions that could be missed using public reference sequences. We also determined that the optimal number of iterations needed to generate alternative reference sequences were seven and five for pigs and chickens, respectively. Our results show that, for genetically distant breeds, generating an alternative reference sequence can facilitate read alignment and variant calling and improve the accuracy of downstream analyses.
通常每个物种都有一个标准的参考序列。当该物种的其他品种存在广泛的变异时,这可能导致对齐不明确和变异调用不准确,并进而影响下游分析的准确性。在这里,我们借助 FPGA 硬件平台,提出了一种通过迭代策略生成替代参考序列的方法,以提高与参考品种在遗传上相距较远的品种的读对齐质量。与已发表的参考基因组相比,通过使用我们构建的替代参考序列,中国本土猪和鸡的映射率分别提高了 0.61-1.68%和 0.09-0.45%。这些序列还使研究人员能够恢复可能会被公共参考序列忽略的高度变异区域。我们还确定了生成替代参考序列所需的最佳迭代次数,分别为猪和鸡的 7 次和 5 次。我们的结果表明,对于遗传上相距较远的品种,生成替代参考序列可以促进读对齐和变异调用,并提高下游分析的准确性。