• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VCF2Dis:一种用于从VCF文件计算成对遗传距离并构建群体系统发育树的超快速高效工具。

VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files.

作者信息

Xu Lian, He Weiming, Tai Shuaishuai, Huang Xiaoli, Qin Mumu, Liao Xun, Jing Yi, Yang Jian, Fang Xiaodong, Shi Jianhua, Jin Nana

机构信息

Institute for Translational Neuroscience of Affiliated Hospital 2 of Nantong University, Center for Neural Developmental and Degenerative Research of Nantong University, Key Laboratory of Neurodegenerative Diseases, Nantong, Jiangsu 226014, China.

Key Laboratory of Neuroregeneration, Ministry of Education and Jiangsu Province, Co-innovation Center of Neuroregeneration, NMPA Key Laboratory for Research and Evaluation of Tissue Engineering Technology Products, Nantong University, Nantong, Jiangsu 226001, China.

出版信息

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf032.

DOI:10.1093/gigascience/giaf032
PMID:40184433
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11970368/
Abstract

BACKGROUND

Genetic distance metrics are crucial for understanding the evolutionary relationships and population structure of organisms. Progress in next-generation sequencing technology has given rise of genotyping data of thousands of individuals. The standard Variant Call Format (VCF) is widely used to store genomic variation information, but calculating genetic distance and constructing population phylogeny directly from large VCF files can be challenging. Moreover, the existing tools that implement such functions remain limited and have low performance in processing large-scale genotype data, especially in the area of memory efficiency.

FINDINGS

To address these challenges, we introduce VCF2Dis, an ultra-fast and efficient tool that calculates pairwise genetic distance directly from large VCF files and then constructs distance-based population phylogeny using the ape package. Benchmarking results demonstrate the tool's efficiency, with rapid processing times, minimal memory usage (e.g., 0.37 GB for the complete analysis of 2,504 samples with 81.2 million variants), and high accuracy, even when handling datasets with millions of variants from thousands of individuals. Its straightforward command-line interface, compatibility with downstream phylogenetic analysis tools (e.g., MEGA, Phylip, and FastTree), and support for multithreading make it a valuable tool for researchers studying population relationships. These advantages meaning VCF2Dis has already been widely utilized in many published genomic studies.

CONCLUSION

We present VCF2Dis, a straightforward and efficient tool for calculating genetic distance and constructing population phylogeny directly from large-scale genotype data. VCF2Dis has been widely applied, facilitating the exploration of population relationship in extensive genome sequencing studies.

摘要

背景

遗传距离度量对于理解生物体的进化关系和种群结构至关重要。下一代测序技术的进步带来了数千个个体的基因分型数据。标准变异调用格式(VCF)被广泛用于存储基因组变异信息,但直接从大型VCF文件计算遗传距离并构建种群系统发育树可能具有挑战性。此外,实现此类功能的现有工具仍然有限,并且在处理大规模基因型数据时性能较低,尤其是在内存效率方面。

研究结果

为应对这些挑战,我们引入了VCF2Dis,这是一个超快速且高效的工具,它可以直接从大型VCF文件计算成对遗传距离,然后使用ape软件包构建基于距离的种群系统发育树。基准测试结果证明了该工具的效率,处理时间快速,内存使用极少(例如,对包含8120万个变异的2504个样本进行完整分析时仅需0.37GB内存),并且即使处理来自数千个个体的数百万个变异的数据集也具有高精度。其简单的命令行界面、与下游系统发育分析工具(如MEGA、Phylip和FastTree)的兼容性以及对多线程的支持,使其成为研究种群关系的研究人员的宝贵工具。这些优势意味着VCF2Dis已经在许多已发表的基因组研究中得到广泛应用。

结论

我们展示了VCF2Dis,这是一个用于直接从大规模基因型数据计算遗传距离并构建种群系统发育树的简单高效工具。VCF2Dis已得到广泛应用,有助于在广泛的基因组测序研究中探索种群关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da38/11970368/eafc665450e7/giaf032fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da38/11970368/784596a8a88c/giaf032fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da38/11970368/eafc665450e7/giaf032fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da38/11970368/784596a8a88c/giaf032fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da38/11970368/eafc665450e7/giaf032fig2.jpg

相似文献

1
VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files.VCF2Dis:一种用于从VCF文件计算成对遗传距离并构建群体系统发育树的超快速高效工具。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf032.
2
GSC: efficient lossless compression of VCF files with fast query.GSC:实现 VCF 文件的高效无损压缩和快速查询
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae046.
3
Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) files.变异工具工具箱:一种改进的工具,用于分析和操作变异调用格式 (VCF) 文件。
BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-15-S7-S12. Epub 2014 May 28.
4
VCF-Explorer: filtering and analysing whole genome VCF files.VCF-Explorer:过滤和分析全基因组 VCF 文件。
Bioinformatics. 2017 Nov 1;33(21):3468-3470. doi: 10.1093/bioinformatics/btx422.
5
vcfr: a package to manipulate and visualize variant call format data in R.vcfr:一个用于在R中处理和可视化变异调用格式数据的软件包。
Mol Ecol Resour. 2017 Jan;17(1):44-53. doi: 10.1111/1755-0998.12549. Epub 2016 Jul 12.
6
VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files.VCF-Miner:用于挖掘存储在VCF文件中的变异和注释的基于图形用户界面的应用程序。
Brief Bioinform. 2016 Mar;17(2):346-51. doi: 10.1093/bib/bbv051. Epub 2015 Jul 25.
7
The scalable variant call representation: enabling genetic analysis beyond one million genomes.可扩展的变异调用表示:实现超百万基因组的遗传分析。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae746.
8
: a client-side software to construct population phylogeny from genome-wide SNPs.一种用于从全基因组单核苷酸多态性构建群体系统发育的客户端软件。
PeerJ. 2019 Dec 6;7:e8213. doi: 10.7717/peerj.8213. eCollection 2019.
9
VCF-Server: A web-based visualization tool for high-throughput variant data mining and management.VCF-Server:一个基于网络的高通量变异数据挖掘和管理的可视化工具。
Mol Genet Genomic Med. 2019 Jul;7(7):e00641. doi: 10.1002/mgg3.641. Epub 2019 May 24.
10
re-Searcher: GUI-based bioinformatics tool for simplified genomics data mining of VCF files.再搜索者:用于简化VCF文件基因组学数据挖掘的基于图形用户界面的生物信息学工具。
PeerJ. 2021 May 3;9:e11333. doi: 10.7717/peerj.11333. eCollection 2021.

引用本文的文献

1
Whole-genome sequencing of Tahe red deer () reveals genetic diversity and selection signatures.塔河马鹿的全基因组测序揭示了遗传多样性和选择印记。
Front Vet Sci. 2025 Aug 21;12:1642382. doi: 10.3389/fvets.2025.1642382. eCollection 2025.
2
Whole genome resequencing uncovers candidate genes related to plumage color in Yuexi frizzled feather chicken.全基因组重测序揭示了与岳西卷羽鸡羽色相关的候选基因。
Poult Sci. 2025 Aug 13;104(11):105680. doi: 10.1016/j.psj.2025.105680.
3
Genome-wide Parallelism Underlies Rapid Freshwater Adaptation Fueled by Standing Genetic Variation in a Wild Fish.

本文引用的文献

1
Telomere-to-telomere Citrullus super-pangenome provides direction for watermelon breeding.端粒到端粒的西瓜超级泛基因组为西瓜育种提供了方向。
Nat Genet. 2024 Aug;56(8):1750-1761. doi: 10.1038/s41588-024-01823-6. Epub 2024 Jul 8.
2
Harnessing landrace diversity empowers wheat breeding.利用地方品种多样性赋予小麦育种力量。
Nature. 2024 Aug;632(8026):823-831. doi: 10.1038/s41586-024-07682-9. Epub 2024 Jun 17.
3
: A serialized data object for visualization of a phylogenetic tree and annotation data.用于系统发育树可视化和注释数据的序列化数据对象。
全基因组平行性是野生鱼类中由现存遗传变异推动的快速淡水适应性的基础。
Mol Biol Evol. 2025 Jul 1;42(7). doi: 10.1093/molbev/msaf160.
4
Chromosome-Level Assemblies of Three Candidatus Liberibacter solanacearum Vectors: Dyspersa apicalis (Förster, 1848), Dyspersa pallida (Burckhardt, 1986), and Trioza urticae (Linnaeus, 1758) (Hemiptera: Psylloidea).三种疑似茄科韧皮杆菌载体的染色体水平组装:顶斑潜蝇(Förster,1848年)、苍白斑潜蝇(Burckhardt,1986年)和荨麻三节叶蝉(Linnaeus,1758年)(半翅目:木虱科)
Genome Biol Evol. 2025 May 30;17(6). doi: 10.1093/gbe/evaf116.
Imeta. 2022 Sep 28;1(4):e56. doi: 10.1002/imt2.56. eCollection 2022 Dec.
4
VCF2PCACluster: a simple, fast and memory-efficient tool for principal component analysis of tens of millions of SNPs.VCF2PCACluster:一个简单、快速且内存效率高的工具,用于对数千万个 SNPs 进行主成分分析。
BMC Bioinformatics. 2024 May 1;25(1):173. doi: 10.1186/s12859-024-05770-1.
5
Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool.交互式生命树 (iTOL) v6:系统发育树显示和注释工具的最新更新。
Nucleic Acids Res. 2024 Jul 5;52(W1):W78-W82. doi: 10.1093/nar/gkae268.
6
Adaptive functions of structural variants in human brain development.人类大脑发育中结构变异的适应功能。
Sci Adv. 2024 Apr 5;10(14):eadl4600. doi: 10.1126/sciadv.adl4600.
7
Chromosome-level genome assembly and population genomic resource to accelerate orphan crop lablab breeding.染色体水平基因组组装和群体基因组资源加速孤儿作物田菁的育种。
Nat Commun. 2023 Apr 17;14(1):1915. doi: 10.1038/s41467-023-37489-7.
8
The sequences of 150,119 genomes in the UK Biobank.英国生物库中 150119 个基因组的序列。
Nature. 2022 Jul;607(7920):732-740. doi: 10.1038/s41586-022-04965-x. Epub 2022 Jul 20.
9
Distance-based phylogenetic inference from typing data: a unifying view.基于分型数据的距离系统发育推断:一种统一的观点。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa147.
10
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.