• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

4P:从大型DNA多态性数据集中快速计算群体遗传学统计量

4P: fast computing of population genetics statistics from large DNA polymorphism panels.

作者信息

Benazzo Andrea, Panziera Alex, Bertorelle Giorgio

机构信息

Department of Life Sciences and Biotechnology, University of Ferrara via L. Borsari, 46, 44100, Ferrara, Italy.

Department of Life Sciences and Biotechnology, University of Ferrara via L. Borsari, 46, 44100, Ferrara, Italy ; Department of Biodiversity and Molecular Ecology, Fondazione Edmund Mach via E. Mach 1, 38010 S, Michele all'Adige, Italy.

出版信息

Ecol Evol. 2015 Jan;5(1):172-5. doi: 10.1002/ece3.1261. Epub 2014 Dec 11.

DOI:10.1002/ece3.1261
PMID:25628874
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4298444/
Abstract

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.

摘要

大规模DNA测序显著增加了可用于群体遗传学和分子生态学研究的数据量。然而,目前尚无法对来自大量多态性位点的群体内部和群体之间的简单统计量进行并行计算,这使得对一组或子集数据进行探索性分析成为一项非常艰巨的任务。在此,我们展示了4P(多态性面板并行处理),这是一个独立的软件程序,用于从多个个体和多个群体中的数百万个DNA变异快速计算遗传变异统计量(包括联合频率谱)。它处理一种常用于存储来自实证或模拟实验的DNA变异的标准输入文件格式。使用来自人类基因组的大型SNP(单核苷酸多态性)数据集或通过模拟获得的数据集对4P的计算性能进行了评估。4P比其他同类程序更快或快得多,并且使用多核计算机或服务器进行并行计算的影响很明显。对于需要一个简单快速的计算机程序来对大型基因组数据面板进行探索性群体遗传学分析的生物学家来说,4P是一个有用的工具。它也特别适合分析模拟研究中产生的多个数据集。提供了Unix、Windows和MacOs版本,以及便于进行流水线实现的源代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0086/4298444/9600745dff95/ece30005-0172-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0086/4298444/9600745dff95/ece30005-0172-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0086/4298444/9600745dff95/ece30005-0172-f1.jpg

相似文献

1
4P: fast computing of population genetics statistics from large DNA polymorphism panels.4P:从大型DNA多态性数据集中快速计算群体遗传学统计量
Ecol Evol. 2015 Jan;5(1):172-5. doi: 10.1002/ece3.1261. Epub 2014 Dec 11.
2
ProSeq4: A user-friendly multiplatform program for preparation and analysis of large-scale DNA polymorphism datasets.ProSeq4:一个用户友好的多平台程序,用于准备和分析大规模 DNA 多态性数据集。
Mol Ecol Resour. 2024 Jul;24(5):e13962. doi: 10.1111/1755-0998.13962. Epub 2024 Apr 22.
3
NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data.用于大规模新一代测序(Illumina)数据并行、自动化和快速质量控制分析的NGS-QCbox与树莓派
PLoS One. 2015 Oct 13;10(10):e0139868. doi: 10.1371/journal.pone.0139868. eCollection 2015.
4
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.SPSmart:使基于群体的单核苷酸多态性(SNP)基因型数据库适用于快速全面的网络访问。
BMC Bioinformatics. 2008 Oct 10;9:428. doi: 10.1186/1471-2105-9-428.
5
POPTREE2: Software for constructing population trees from allele frequency data and computing other population statistics with Windows interface.POPTREE2:用于从等位基因频率数据构建群体树并计算其他群体统计信息的软件,具有 Windows 界面。
Mol Biol Evol. 2010 Apr;27(4):747-52. doi: 10.1093/molbev/msp312. Epub 2009 Dec 18.
6
4Pipe4--A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information.4Pipe4——一种用于在没有参考序列或菌株信息的数据集中检测单核苷酸多态性的454数据分析流程。
BMC Bioinformatics. 2016 Jan 19;17:41. doi: 10.1186/s12859-016-0892-1.
7
ADMIXPIPE: population analyses in ADMIXTURE for non-model organisms.ADMIXPIPE:非模式生物在 ADMIXTURE 中的群体分析。
BMC Bioinformatics. 2020 Jul 29;21(1):337. doi: 10.1186/s12859-020-03701-4.
8
sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs.sim1000G:一个用于无关个体和基于家系设计的 R 语言中易于使用的遗传变异模拟器。
BMC Bioinformatics. 2019 Jan 15;20(1):26. doi: 10.1186/s12859-019-2611-1.
9
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
10
CisSERS: Customizable In Silico Sequence Evaluation for Restriction Sites.CisSERS:用于限制性酶切位点的可定制计算机序列评估
PLoS One. 2016 Apr 12;11(4):e0152404. doi: 10.1371/journal.pone.0152404. eCollection 2016.

引用本文的文献

1
Populations of Latvia and Lithuania in the context of some Indo-European and non-Indo-European speaking populations of Europe and India: insights from genetic structure analysis.拉脱维亚和立陶宛人口与欧洲及印度一些说印欧语系和非印欧语系语言的人口情况:来自基因结构分析的见解
Front Genet. 2024 Nov 20;15:1493270. doi: 10.3389/fgene.2024.1493270. eCollection 2024.
2
Disparate and parallel craniofacial climatic adaptations in native populations of Asia, North America, and South America.亚洲、北美洲和南美洲本土人群的不同且平行的颅面气候适应。
J Anat. 2024 Nov;245(5):699-724. doi: 10.1111/joa.14115. Epub 2024 Aug 26.
3

本文引用的文献

1
ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE.估计用于群体结构分析的F统计量
Evolution. 1984 Nov;38(6):1358-1370. doi: 10.1111/j.1558-5646.1984.tb05657.x.
2
PopGenome: an efficient Swiss army knife for population genomic analyses in R.PopGenome:R语言中用于群体基因组分析的高效多功能工具。
Mol Biol Evol. 2014 Jul;31(7):1929-36. doi: 10.1093/molbev/msu136. Epub 2014 Apr 16.
3
Robust demographic inference from genomic and SNP data.基于基因组和单核苷酸多态性数据的可靠人口统计学推断。
Genomic diversity and population structure of teosinte (Zea spp.) and its conservation implications.
玉米的基因组多样性和种群结构及其保护意义。
PLoS One. 2023 Oct 11;18(10):e0291944. doi: 10.1371/journal.pone.0291944. eCollection 2023.
4
Identifying signatures of natural selection in Indian populations.鉴定印度人群中的自然选择特征。
PLoS One. 2022 Aug 4;17(8):e0271767. doi: 10.1371/journal.pone.0271767. eCollection 2022.
5
Fine-scale population structure and demographic history of British Pakistanis.英国巴基斯坦人的精细种群结构和人口历史。
Nat Commun. 2021 Dec 10;12(1):7189. doi: 10.1038/s41467-021-27394-2.
6
Genome-Wide Landscape of North-Eastern European Populations: A View from Lithuania.东北欧人群的全基因组景观:来自立陶宛的观察。
Genes (Basel). 2021 Oct 28;12(11):1730. doi: 10.3390/genes12111730.
7
More Rule than Exception: Parallel Evidence of Ancient Migrations in Grammars and Genomes of Finno-Ugric Speakers.更多的规则而非例外:芬兰-乌戈尔语使用者的语法和基因组中的古代迁徙的并行证据。
Genes (Basel). 2020 Dec 11;11(12):1491. doi: 10.3390/genes11121491.
8
The genetic structure and adaptation of Andean highlanders and Amazonians are influenced by the interplay between geography and culture.安第斯高地人和亚马逊人的遗传结构和适应受到地理和文化相互作用的影响。
Proc Natl Acad Sci U S A. 2020 Dec 22;117(51):32557-32565. doi: 10.1073/pnas.2013773117. Epub 2020 Dec 4.
9
Inferring Effective Population Size and Divergence Time in the Lithuanian Population According to High-Density Genotyping Data.根据高密度基因分型数据推断立陶宛人群的有效种群大小和分化时间。
Genes (Basel). 2020 Mar 10;11(3):293. doi: 10.3390/genes11030293.
10
Parallel and nonparallel genomic responses contribute to herbicide resistance in Ipomoea purpurea, a common agricultural weed.伴生与非伴生的基因组响应导致番薯属植物(一种常见的农业杂草)对除草剂产生抗性。
PLoS Genet. 2020 Feb 3;16(2):e1008593. doi: 10.1371/journal.pgen.1008593. eCollection 2020 Feb.
PLoS Genet. 2013 Oct;9(10):e1003905. doi: 10.1371/journal.pgen.1003905. Epub 2013 Oct 24.
4
Stacks: an analysis tool set for population genomics.Stacks:用于群体基因组学的分析工具集。
Mol Ecol. 2013 Jun;22(11):3124-40. doi: 10.1111/mec.12354. Epub 2013 May 24.
5
Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.鲟鱼保护基因组学:利用 RAD 测序进行 SNP 的发现和验证。
Mol Ecol. 2013 Jun;22(11):3112-23. doi: 10.1111/mec.12234. Epub 2013 Mar 8.
6
Genomic patterns of introgression in rainbow and westslope cutthroat trout illuminated by overlapping paired-end RAD sequencing.重叠双端 RAD 测序揭示虹鳟和西尔斯湖鳜鱼的基因渗入模式。
Mol Ecol. 2013 Jun;22(11):3002-13. doi: 10.1111/mec.12239. Epub 2013 Feb 21.
7
Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species.太平洋七鳃鳗的群体基因组学:高度扩散物种的适应性变异。
Mol Ecol. 2013 Jun;22(11):2898-916. doi: 10.1111/mec.12150. Epub 2012 Dec 3.
8
An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
9
Population genomic signatures of divergent adaptation, gene flow and hybrid speciation in the rapid radiation of Lake Victoria cichlid fishes.维多利亚湖慈鲷鱼类快速辐射中分歧适应、基因流和杂种形成的群体基因组特征。
Mol Ecol. 2013 Jun;22(11):2848-63. doi: 10.1111/mec.12083. Epub 2012 Nov 5.
10
Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation.全基因组 RAD 序列数据为维多利亚湖慈鲷适应辐射中的物种界限和关系提供了前所未有的分辨率。
Mol Ecol. 2013 Feb;22(3):787-98. doi: 10.1111/mec.12023. Epub 2012 Oct 12.