Suppr超能文献

VCF2Dis:一种用于从VCF文件计算成对遗传距离并构建群体系统发育树的超快速高效工具。

VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files.

作者信息

Xu Lian, He Weiming, Tai Shuaishuai, Huang Xiaoli, Qin Mumu, Liao Xun, Jing Yi, Yang Jian, Fang Xiaodong, Shi Jianhua, Jin Nana

机构信息

Institute for Translational Neuroscience of Affiliated Hospital 2 of Nantong University, Center for Neural Developmental and Degenerative Research of Nantong University, Key Laboratory of Neurodegenerative Diseases, Nantong, Jiangsu 226014, China.

Key Laboratory of Neuroregeneration, Ministry of Education and Jiangsu Province, Co-innovation Center of Neuroregeneration, NMPA Key Laboratory for Research and Evaluation of Tissue Engineering Technology Products, Nantong University, Nantong, Jiangsu 226001, China.

出版信息

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf032.

Abstract

BACKGROUND

Genetic distance metrics are crucial for understanding the evolutionary relationships and population structure of organisms. Progress in next-generation sequencing technology has given rise of genotyping data of thousands of individuals. The standard Variant Call Format (VCF) is widely used to store genomic variation information, but calculating genetic distance and constructing population phylogeny directly from large VCF files can be challenging. Moreover, the existing tools that implement such functions remain limited and have low performance in processing large-scale genotype data, especially in the area of memory efficiency.

FINDINGS

To address these challenges, we introduce VCF2Dis, an ultra-fast and efficient tool that calculates pairwise genetic distance directly from large VCF files and then constructs distance-based population phylogeny using the ape package. Benchmarking results demonstrate the tool's efficiency, with rapid processing times, minimal memory usage (e.g., 0.37 GB for the complete analysis of 2,504 samples with 81.2 million variants), and high accuracy, even when handling datasets with millions of variants from thousands of individuals. Its straightforward command-line interface, compatibility with downstream phylogenetic analysis tools (e.g., MEGA, Phylip, and FastTree), and support for multithreading make it a valuable tool for researchers studying population relationships. These advantages meaning VCF2Dis has already been widely utilized in many published genomic studies.

CONCLUSION

We present VCF2Dis, a straightforward and efficient tool for calculating genetic distance and constructing population phylogeny directly from large-scale genotype data. VCF2Dis has been widely applied, facilitating the exploration of population relationship in extensive genome sequencing studies.

摘要

背景

遗传距离度量对于理解生物体的进化关系和种群结构至关重要。下一代测序技术的进步带来了数千个个体的基因分型数据。标准变异调用格式(VCF)被广泛用于存储基因组变异信息,但直接从大型VCF文件计算遗传距离并构建种群系统发育树可能具有挑战性。此外,实现此类功能的现有工具仍然有限,并且在处理大规模基因型数据时性能较低,尤其是在内存效率方面。

研究结果

为应对这些挑战,我们引入了VCF2Dis,这是一个超快速且高效的工具,它可以直接从大型VCF文件计算成对遗传距离,然后使用ape软件包构建基于距离的种群系统发育树。基准测试结果证明了该工具的效率,处理时间快速,内存使用极少(例如,对包含8120万个变异的2504个样本进行完整分析时仅需0.37GB内存),并且即使处理来自数千个个体的数百万个变异的数据集也具有高精度。其简单的命令行界面、与下游系统发育分析工具(如MEGA、Phylip和FastTree)的兼容性以及对多线程的支持,使其成为研究种群关系的研究人员的宝贵工具。这些优势意味着VCF2Dis已经在许多已发表的基因组研究中得到广泛应用。

结论

我们展示了VCF2Dis,这是一个用于直接从大规模基因型数据计算遗传距离并构建种群系统发育树的简单高效工具。VCF2Dis已得到广泛应用,有助于在广泛的基因组测序研究中探索种群关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da38/11970368/784596a8a88c/giaf032fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验