Suppr超能文献

方法学考虑对基于基因的植物泛基因组构建的影响。

The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes.

机构信息

Department of Life Sciences, School of Plant Sciences and Food Security, Tel-Aviv University, Tel Aviv, Israel.

出版信息

Genome Biol Evol. 2023 Jul 3;15(7). doi: 10.1093/gbe/evad121.

Abstract

Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence-absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies.

摘要

泛基因组学是研究植物群体遗传多样性的一种新兴方法。与常见的重测序研究相比,重测序研究将全基因组测序数据与单个参考基因组进行比较,而泛基因组(PG)的构建则涉及将多个基因组直接相互比较,从而能够检测到参考基因组中不存在的基因组序列和基因,并分析基因内容的多样性。尽管近年来已经发表了多篇描述各种植物 PG 的研究,但更好地了解用于构建 PG 的计算程序的影响可以指导研究人员做出更明智的方法决策。在这里,我们通过构建和比较拟南芥和栽培大豆的多个 PG,并对已发表的 PG 进行荟萃分析,研究了几个关键方法因素对获得的基因库和基因存在缺失检测的影响。这些因素包括构建方法、测序深度和基因注释所用输入数据的范围。我们观察到使用三种常见方法(从头组装和注释、图谱到泛基因组和迭代组装)构建的 PG 之间存在很大差异,并且结果取决于输入数据的范围。具体来说,我们报告了使用不同方法和输入数据推断的基因内容之间的低一致性。我们的结果应该提高社区对在 PG 构建过程中做出的方法决策的后果的认识,并强调需要进一步研究常用方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e9a/10340445/f2dd0f7a7241/evad121f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验