Department of Life Sciences, Imperial College London, London, UK.
Institute of population genetics, University of Veterinary Medicine Vienna, Vienna, Austria.
F1000Res. 2023 Jul 14;11:126. doi: 10.12688/f1000research.104368.2. eCollection 2022.
A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicability of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia.
对 DNA 测序数据进行准确的分析对于提取有意义的信息和推断感兴趣的数量非常重要。测序和映射错误,以及低覆盖度和可变覆盖度,都妨碍了基因型和变体的识别,以及群体遗传参数的估计。目前可用于从测序数据中估计群体遗传参数的方法和实现,要么仅适用于模式生物基因组的分析,要么需要中等测序覆盖度,要么不容易适应特定应用。为了解决这些问题,我们引入了 ngsJulia,这是一个用 Julia 语言编写的模板和函数集合,用于处理短读测序数据进行群体遗传分析。我们进一步描述了两种实现,ngsPool 和 ngsPloidy,分别用于分析混合测序数据和多倍体基因组。通过模拟,我们使用这些实现,使用已建立和新的统计方法,说明了估计各种群体遗传参数的性能。这些结果为最佳实验设计提供了信息,并证明了 ngsJulia 中的方法即使从低覆盖度测序数据中也能估计感兴趣的参数的适用性。ngsJulia 为用户提供了一个灵活高效的框架,用于对测序数据进行特定分析。ngsJulia 可在以下网址获得:https://github.com/mfumagalli/ngsJulia。