MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
BMC Genomics. 2022 Oct 28;23(1):732. doi: 10.1186/s12864-022-08961-3.
Structural variants (SVs) play important roles in adaptation evolution and species diversification. Especially, in plants, many phenotypes of response to the environment were found to be associated with SVs. Despite the prevalence and significance of SVs, long insertions remain poorly detected and studied in all but model species.
We used whole-genome resequencing of paired reads from 80 Asian butternuts to detect long insertions and further analyse their characteristics and potential functional effects. By combining of mapping-based and de novo assembly-based methods, we obtained a multiple related species pangenome representing higher taxonomic groups. We obtained 89,312 distinct contigs totaling 147,773,999 base pair (bp) of new sequences, of which 347 were putative long insertions placed in the reference genome. Most of the putative long insertions appeared in multiple species; in contrast, only 62 putative long insertions appeared in one species, which may be involved in the response to the environment. 65 putative long insertions fell into 61 distinct protein-coding genes involved in plant development, and 105 putative long insertions fell into upstream of 106 distinct protein-coding genes involved in cellular respiration. 3,367 genes were annotated in 2,606 contigs. We propose PLAINS ( https://github.com/CMB-BNU/PLAINS.git ), a streamlined, comprehensive pipeline for the prediction and analysis of long insertions using whole-genome resequencing.
Our study lays down an important foundation for further whole-genome long insertion studies, allowing the investigation of their effects by experiments.
结构变异(SVs)在适应进化和物种多样化中起着重要作用。特别是在植物中,许多对环境的表型反应被发现与 SVs 有关。尽管 SVs 普遍存在且意义重大,但除了模式物种外,长插入在所有物种中的检测和研究都很差。
我们使用来自 80 个亚洲油胡桃的配对读取的全基因组重测序来检测长插入,并进一步分析它们的特征和潜在的功能效应。通过结合基于映射和基于从头组装的方法,我们获得了一个代表更高分类群的多个相关物种泛基因组。我们获得了 89312 个独特的重叠群,总计 147773999 个碱基对(bp)的新序列,其中 347 个是放置在参考基因组中的假定长插入。大多数假定的长插入出现在多个物种中;相比之下,只有 62 个假定的长插入出现在一个物种中,这可能与对环境的反应有关。65 个假定的长插入落入 61 个不同的参与植物发育的编码蛋白基因中,105 个假定的长插入落入 106 个不同的参与细胞呼吸的编码蛋白基因的上游。2606 个重叠群中有 3367 个基因被注释。我们提出了 PLAINS(https://github.com/CMB-BNU/PLAINS.git),这是一个使用全基因组重测序预测和分析长插入的简化、全面的流水线。
我们的研究为进一步的全基因组长插入研究奠定了重要基础,允许通过实验研究它们的影响。