Suppr超能文献

4P:从大型DNA多态性数据集中快速计算群体遗传学统计量

4P: fast computing of population genetics statistics from large DNA polymorphism panels.

作者信息

Benazzo Andrea, Panziera Alex, Bertorelle Giorgio

机构信息

Department of Life Sciences and Biotechnology, University of Ferrara via L. Borsari, 46, 44100, Ferrara, Italy.

Department of Life Sciences and Biotechnology, University of Ferrara via L. Borsari, 46, 44100, Ferrara, Italy ; Department of Biodiversity and Molecular Ecology, Fondazione Edmund Mach via E. Mach 1, 38010 S, Michele all'Adige, Italy.

出版信息

Ecol Evol. 2015 Jan;5(1):172-5. doi: 10.1002/ece3.1261. Epub 2014 Dec 11.

Abstract

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.

摘要

大规模DNA测序显著增加了可用于群体遗传学和分子生态学研究的数据量。然而,目前尚无法对来自大量多态性位点的群体内部和群体之间的简单统计量进行并行计算,这使得对一组或子集数据进行探索性分析成为一项非常艰巨的任务。在此,我们展示了4P(多态性面板并行处理),这是一个独立的软件程序,用于从多个个体和多个群体中的数百万个DNA变异快速计算遗传变异统计量(包括联合频率谱)。它处理一种常用于存储来自实证或模拟实验的DNA变异的标准输入文件格式。使用来自人类基因组的大型SNP(单核苷酸多态性)数据集或通过模拟获得的数据集对4P的计算性能进行了评估。4P比其他同类程序更快或快得多,并且使用多核计算机或服务器进行并行计算的影响很明显。对于需要一个简单快速的计算机程序来对大型基因组数据面板进行探索性群体遗传学分析的生物学家来说,4P是一个有用的工具。它也特别适合分析模拟研究中产生的多个数据集。提供了Unix、Windows和MacOs版本,以及便于进行流水线实现的源代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0086/4298444/9600745dff95/ece30005-0172-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验