School of Life Science, Huizhou University, Huizhou, 516007, China.
State Key Laboratory of Plant Molecular Genetics, National Center for Gene Research, Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, 200233, China.
Theor Appl Genet. 2022 Jun;135(6):2157-2166. doi: 10.1007/s00122-022-04105-z. Epub 2022 May 3.
This study developed a new genotyping method that can accurately infer heterozygous genotype information from the complex plant genome sequence data, which helped discover new alleles in the association studies. Many software packages and pipelines had been developed to handle the sequence data of the model species. However, Genotyping from complex heterozygous plant genome needs further improvement on the previous methods. Here we present a new pipeline available at https://github.com/Ncgrhg/HetMapv1 ) for variant calling and missing genotype imputation from low coverage sequence data for heterozygous plant genomes. To check the performance of the HetMap on the real sequence data, HetMap was applied to both the F hybrid rice population, which consists of 1495 samples and the wild rice population with 446 samples. The high coverage sequence data of four hybrid rice accessions and two wild rice accessions, which were also included in low coverage sequence data, were used to validate the accuracy of genotype inference. The validation results showed that HetMap archieved significant improvement in heterozygous genotype inference accuracy (13.65% for hybrid rice, 26.05% for wild rice) and total accuracy compared with similar software packages. The application of the new genotype with the genome-wide association study also showed improvement of association power in wild rice awn length phenotype. It could archive high genotype inference accuracy in low sequence coverage in a small population with both the natural and constructed recombination population. HetMap provided a powerful tool for the heterozygous plant genome sequence data analysis, which may help to discover new phenotype regions for the plant species with the complex heterozygous genome.
本研究开发了一种新的基因分型方法,能够从复杂的植物基因组序列数据中准确推断杂合基因型信息,有助于在关联研究中发现新的等位基因。已经开发了许多软件包和流程来处理模式物种的序列数据。然而,复杂杂合植物基因组的基因分型需要对以前的方法进行进一步改进。我们在这里展示了一个新的流程,可在 https://github.com/Ncgrhg/HetMapv1 上获得,用于从低覆盖度的序列数据中对杂合植物基因组进行变异调用和缺失基因型推断。为了检查 HetMap 在真实序列数据上的性能,我们将 HetMap 应用于由 1495 个样本组成的 F 杂交水稻群体和由 446 个样本组成的野生稻群体。还将高覆盖度的四个杂交水稻品系和两个野生水稻品系的序列数据包含在低覆盖度的序列数据中,用于验证基因型推断的准确性。验证结果表明,与类似的软件包相比,HetMap 在杂合基因型推断准确性(杂交水稻为 13.65%,野生稻为 26.05%)和总体准确性方面都有显著提高。将新基因型应用于全基因组关联研究也显示出对野生稻芒长表型关联能力的提高。HetMap 可以在小群体中对自然和构建的重组群体进行低序列覆盖,实现高基因型推断准确性。HetMap 为复杂杂合基因组植物的序列数据分析提供了一个强大的工具,有助于发现新的表型区域。