Department of Biology, University of Oxford, Oxford, UK.
Mol Ecol Resour. 2024 Jul;24(5):e13962. doi: 10.1111/1755-0998.13962. Epub 2024 Apr 22.
Preparation of DNA polymorphism datasets for analysis is an important step in evolutionary genetic and molecular ecology studies. Ever-growing dataset sizes make this step time consuming, but few convenient software tools are available to facilitate processing of large-scale datasets including thousands of sequence alignments. Here I report "processor of sequences v4" (proSeq4)-a user-friendly multiplatform software for preparation and evolutionary genetic analyses of genome- or transcriptome-scale sequence polymorphism datasets. The program has an easy-to-use graphic user interface and is designed to process and analyse many thousands of datasets. It supports over two dozen file formats, includes a flexible sequence editor and various tools for data visualization, quality control and most commonly used evolutionary genetic analyses, such as NJ-phylogeny reconstruction, DNA polymorphism analyses and coalescent simulations. Command line tools (e.g. vcf2fasta) are also provided for easier integration into bioinformatic pipelines. Apart of molecular ecology and evolution research, proSeq4 may be useful for teaching, e.g. for visual illustration of different shapes of phylogenies generated with coalescent simulations in different scenarios. ProSeq4 source code and binaries for Windows, MacOS and Ubuntu are available from https://sourceforge.net/projects/proseq/.
准备用于分析的 DNA 多态性数据集是进化遗传学和分子生态学研究的重要步骤。不断增长的数据集大小使得这一步骤非常耗时,但很少有方便的软件工具可用于处理包括数千个序列比对在内的大规模数据集。在这里,我报告了“序列处理器 v4”(proSeq4)-一种用于基因组或转录组规模序列多态性数据集准备和进化遗传分析的用户友好的多平台软件。该程序具有易于使用的图形用户界面,旨在处理和分析数千个数据集。它支持二十多种文件格式,包括灵活的序列编辑器和各种用于数据可视化、质量控制和最常用的进化遗传分析的工具,例如 NJ 系统发育重建、DNA 多态性分析和合并模拟。还提供了命令行工具(例如 vcf2fasta),以便更轻松地集成到生物信息学管道中。除了分子生态学和进化研究外,proSeq4 还可用于教学,例如,用于直观显示不同合并模拟在不同场景下生成的系统发育的不同形状。proSeq4 的源代码和适用于 Windows、MacOS 和 Ubuntu 的二进制文件可从 https://sourceforge.net/projects/proseq/ 获得。