Suppr超能文献

多倍体物种中通过测序进行准确基因分型所需的序列覆盖度。

Sequence coverage required for accurate genotyping by sequencing in polyploid species.

作者信息

Wang Lin, Yang Jixuan, Zhang Hong, Tao Qin, Zhang Yuxin, Dang Zhenyu, Zhang Fengjun, Luo Zewei

机构信息

Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.

Department of Statistics and Finance, University of Science and Technology of China, Hefei, China.

出版信息

Mol Ecol Resour. 2022 May;22(4):1417-1426. doi: 10.1111/1755-0998.13558. Epub 2021 Dec 20.

Abstract

Polyploidy plays an important role in the evolution of eukaryotes, especially for flowering plants. Many of ecologically or agronomically important plant or crop species are polyploids, including sycamore maple (tetraploid), the world second and third largest food crops wheat (hexaploid) and potato (tetraploid) as well as economically important aquaculture animals such as Atlantic salmon and trout. The next generation sequencing data enables to allocate genotype at a sequence variant site, known as genotyping by sequencing (GBS). GBS has stimulated enormous interests in population based genomics studies in almost all diploid and many polyploid organisms. DNA sequence polymorphisms are codominant and thus fully informative about the underlying genotype at the polymorphic site, making GBS a straightforward task in diploids. However, sequence data may usually be uninformative in polyploid species, making GBS a far more challenging task in polyploids. This paper presents novel and rigorous statistical methods for predicting the number of sequence reads needed to ensure accurate GBS at a polymorphic site bared by the reads in polyploids and shows that a dozen of reads can ensure a probability of 95% to recover all constituent alleles of any tetraploid genotype but several hundreds of reads are needed to accurately uncover the genotype with probability confidence of 90%, subverting the proposition of GBS using low coverage sequence data in the literature. The theoretical prediction was tested by use of RAD-seq data from tetraploid potato cultivars. The paper provides polyploid experimentalists with theoretical guides and methods for designing and conducting their sequence-based studies.

摘要

多倍体在真核生物的进化中起着重要作用,尤其是对于开花植物而言。许多具有生态或农业重要性的植物或作物物种都是多倍体,包括悬铃木(四倍体)、世界第二和第三大粮食作物小麦(六倍体)和马铃薯(四倍体),以及具有经济重要性的水产养殖动物,如大西洋鲑鱼和鳟鱼。新一代测序数据能够在序列变异位点分配基因型,即所谓的测序基因分型(GBS)。GBS激发了几乎所有二倍体和许多多倍体生物群体基因组学研究的巨大兴趣。DNA序列多态性是共显性的,因此在多态性位点能够完全提供关于潜在基因型的信息,这使得GBS在二倍体中是一项直接的任务。然而,在多倍体物种中,序列数据通常可能无信息,这使得GBS在多倍体中成为一项更具挑战性的任务。本文提出了新颖且严谨的统计方法,用于预测在多倍体中确保由读取覆盖的多态性位点准确进行GBS所需的序列读取数,并表明十几条读取可以确保有95%的概率恢复任何四倍体基因型的所有组成等位基因,但需要数百条读取才能以90%的概率置信度准确揭示基因型,颠覆了文献中使用低覆盖序列数据进行GBS的观点。通过使用四倍体马铃薯品种的RAD-seq数据对理论预测进行了检验。本文为多倍体实验人员提供了理论指导以及设计和开展基于序列研究的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验