Maruki Takahiro, Ozere April, Freeman Jack, Cristescu Melania E
Department of Biology McGill University Montreal Quebec Canada.
Ecol Evol. 2024 Nov 10;14(11):e70339. doi: 10.1002/ece3.70339. eCollection 2024 Nov.
Accurate estimates of mutation rates derived from genome-wide mutation accumulation (MA) data are fundamental to understanding basic evolutionary processes. The rapidly improving high-throughput sequencing technologies provide unprecedented opportunities to identify single nucleotide mutations across genomes. However, such MA derived data are often difficult to analyze and the performance of the available methods of analysis is not well understood. In this study, we used the existing Bayesian Genotype Caller adapted for MA data that we refer to as Bayesian Mutation Finder (BMF) for identifying single nucleotide mutations while considering the characteristics of the data. We compared the performance of BMF with the widely used Genome Analysis Toolkit (GATK) by applying these two methods to time-series MA data as well as simulated data. The time-series data were obtained by propagating over an average of 188 generations and performing whole-genome sequencing of 14 MA lines across three time points. The results indicate that BMF enables more accurate identification of single nucleotide mutations than GATK especially when applied to the empirical data. Furthermore, BMF involves the use of fewer parameters and is more computationally efficient than GATK. Both BMF and GATK found surprisingly many candidate mutations that were not confirmed at later time points. We systematically infer causes of the unconfirmed candidate mutations, introduce a framework for estimating mutation rates based on genome-wide candidate mutations confirmed by subsequent sequencing, and provide an improved mutation rate estimate for .
从全基因组突变积累(MA)数据得出的突变率的准确估计对于理解基本进化过程至关重要。快速发展的高通量测序技术为识别全基因组中的单核苷酸突变提供了前所未有的机会。然而,这种源自MA的数据通常难以分析,并且对现有分析方法的性能了解不足。在本研究中,我们使用了适用于MA数据的现有贝叶斯基因型分型工具,我们将其称为贝叶斯突变发现器(BMF),在考虑数据特征的同时识别单核苷酸突变。我们通过将这两种方法应用于时间序列MA数据以及模拟数据,比较了BMF与广泛使用的基因组分析工具包(GATK)的性能。时间序列数据是通过平均繁殖188代并在三个时间点对14个MA系进行全基因组测序获得的。结果表明,BMF比GATK能够更准确地识别单核苷酸突变,特别是在应用于实证数据时。此外,BMF涉及使用更少的参数,并且比GATK计算效率更高。BMF和GATK都发现了数量惊人的候选突变,这些突变在后续时间点未得到证实。我们系统地推断未证实的候选突变的原因,引入一个基于后续测序确认的全基因组候选突变来估计突变率的框架,并提供了一个改进的突变率估计值。