Suppr超能文献

用于对二倍体和多倍体物种的测序基因分型数据进行无参考基因组多样性分析的强大且高效的软件。

Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species.

作者信息

Parra-Salazar Andrea, Gomez Jorge, Lozano-Arce Daniela, Reyes-Herrera Paula H, Duitama Jorge

机构信息

Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia.

Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA), Bogotá, Colombia.

出版信息

Mol Ecol Resour. 2022 Jan;22(1):439-454. doi: 10.1111/1755-0998.13477. Epub 2021 Jul 29.

Abstract

Genotyping-by-sequencing (GBS) is a widely used and cost-effective technique for obtaining large numbers of genetic markers from populations by sequencing regions adjacent to restriction cut sites. Although a standard reference-based pipeline can be followed to analyse GBS reads, a reference genome is still not available for a large number of species. Hence, reference-free approaches are required to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, available tools to perform de novo analysis of GBS reads face issues of usability, accuracy and performance. Furthermore, few available tools are suitable for analysing data sets from polyploid species. In this manuscript, we describe a novel algorithm to perform reference-free variant detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow for efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-of-the-art variant detector already implemented in this tool. We performed benchmark experiments with three different empirical data sets of plants and animals with different population structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for many research groups conducting population genetic studies in a wide variety of species.

摘要

简化基因组测序(GBS)是一种广泛使用且经济高效的技术,可通过对限制性酶切位点附近区域进行测序,从群体中获取大量遗传标记。尽管可以遵循基于标准参考基因组的流程来分析GBS读数,但仍有大量物种没有可用的参考基因组。因此,需要采用无参考基因组的方法来生成可从GBS实验中获得的遗传变异信息。不幸的是,现有的用于对GBS读数进行从头分析的工具存在可用性、准确性和性能方面的问题。此外,很少有可用工具适用于分析多倍体物种的数据集。在本论文中,我们描述了一种用于从GBS读数中进行无参考基因组变异检测和基因分型的新算法。在共有序列的动态哈希表上进行非精确搜索可实现高效的读数聚类和排序。该算法已集成到下一代测序体验平台(NGSEP)中,以整合该工具中已实现的最先进的变异检测器。我们使用了具有不同群体结构和倍性的植物和动物的三个不同实证数据集进行基准实验,并在不同的读数深度下采用不同的GBS方案进行测序。这些实验表明,与现有解决方案相比,NGSEP具有相当的准确性,在某些情况下准确性更高,并且计算效率始终更高。我们期望这一新进展将对许多在各种物种中进行群体遗传学研究的研究小组有用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验