SambaR：一个用于快速、轻松且可重复的二倍体 SNP 数据集群体遗传分析的 R 包。

SambaR: An R package for fast, easy and reproducible population-genetic analyses of biallelic SNP data sets.

机构信息

Department of Biosciences, Durham University, Durham, UK.

Biodiversity and Climate Research Centre, Senckenberg Institute, Frankfurt am Main, Germany.

出版信息

Mol Ecol Resour. 2021 May;21(4):1369-1379. doi: 10.1111/1755-0998.13339. Epub 2021 Feb 20.

DOI:10.1111/1755-0998.13339

PMID:33503314

Abstract

SNP data sets can be used to infer a wealth of information about natural populations, including information about their structure, genetic diversity, and the presence of loci under selection. However, SNP data analysis can be a time-consuming and challenging process, not in the least because at present many different software packages are needed to execute and depict the wide variety of mainstream population-genetic analyses. Here, we present SambaR, an integrative and user-friendly R package which automates and simplifies quality control and population-genetic analyses of biallelic SNP data sets. SambaR allows users to perform mainstream population-genetic analyses and to generate a wide variety of ready to publish graphs with a minimum number of commands (less than 10). These wrapper commands call functions of existing packages (including adegenet, ape, LEA, poppr, pcadapt and StAMPP) as well as new tools uniquely implemented in SambaR. We tested SambaR on online available SNP data sets and found that SambaR can process data sets of over 100,000 SNPs and hundreds of individuals within hours, given sufficient computing power. Newly developed tools implemented in SambaR facilitate optimization of filter settings, objective interpretation of ordination analyses, enhance comparability of diversity estimates from reduced representation library SNP data sets, and generate reduced SNP panels and structure-like plots with Bayesian population assignment probabilities. SambaR facilitates rapid population genetic analyses on biallelic SNP data sets by removing three major time sinks: file handling, software learning, and data plotting. In addition, SambaR provides a convenient platform for SNP data storage and management, as well as several new utilities, including guidance in setting appropriate data filters. The SambaR source script, manual and example data set are distributed through GitHub: https://github.com/mennodejong1986/SambaR.

摘要

SNP 数据集可用于推断有关自然种群的大量信息，包括其结构、遗传多样性以及受选择影响的基因座的信息。然而，SNP 数据分析可能是一个耗时且具有挑战性的过程，这主要是因为目前需要许多不同的软件包来执行和描述各种主流的群体遗传学分析。在这里，我们介绍了 SambaR，这是一个集成的、用户友好的 R 包，它可以自动简化二态 SNP 数据集的质量控制和群体遗传学分析。SambaR 允许用户执行主流的群体遗传学分析，并使用最少的命令（少于 10 个）生成各种准备发布的图形。这些包装命令调用现有的包（包括 adegenet、ape、LEA、poppr、pcadapt 和 StAMPP）的功能，以及在 SambaR 中唯一实现的新工具。我们在在线可用的 SNP 数据集上测试了 SambaR，发现只要有足够的计算能力，SambaR 可以在数小时内处理超过 100,000 个 SNP 和数百个个体的数据集。SambaR 中开发的新工具有助于优化过滤设置、对排序分析的目标解释、增强减少代表性文库 SNP 数据集的多样性估计的可比性，并生成带有贝叶斯群体分配概率的简化 SNP 面板和结构样图。SambaR 通过消除三个主要的时间消耗源来促进二态 SNP 数据集的快速群体遗传分析：文件处理、软件学习和数据绘图。此外，SambaR 为 SNP 数据存储和管理提供了一个方便的平台，以及几个新的实用程序，包括适当的数据过滤设置的指导。SambaR 的源代码、手册和示例数据集通过 GitHub 分发：https://github.com/mennodejong1986/SambaR。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

SambaR：一个用于快速、轻松且可重复的二倍体 SNP 数据集群体遗传分析的 R 包。

SambaR: An R package for fast, easy and reproducible population-genetic analyses of biallelic SNP data sets.

机构信息

出版信息

相似文献

引用本文的文献

SambaR：一个用于快速、轻松且可重复的二倍体 SNP 数据集群体遗传分析的 R 包。

SambaR: An R package for fast, easy and reproducible population-genetic analyses of biallelic SNP data sets.

机构信息

出版信息

相似文献

引用本文的文献