Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland.
Mol Ecol Resour. 2023 Nov;23(8):1757-1771. doi: 10.1111/1755-0998.13841. Epub 2023 Jul 24.
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.
突变是所有遗传变异的主要来源。了解其速率对于任何进化遗传分析都是至关重要的,但长期以来,这方面的知识一直难以捉摸,只能间接推断。近年来,亲子比较首次提供了直接的突变率估计。然而,由于假阳性率高,以及对候选新生突变进行标准化筛选缺乏共识,这些分析具有挑战性。在这里,我们验证了机器学习方法在这种任务中的应用,并估计了孔雀鱼(Poecilia reticulata)的突变率,孔雀鱼是生态进化研究中的一种模式物种。我们对 4 个亲代和 20 个后代进行了测序,然后筛选它们的基因组中的新生突变。最初大量的候选新生突变经过严格筛选以去除假阳性结果。这些结果与使用监督机器学习方法估计的突变率进行了比较。这两种方法都对所有候选新生突变进行了分子验证,结果相似。ML 方法独特地识别出了三个突变,但总体来说需要更多的人工干预,并且假阳性和假阴性的比例更高。这两种方法都一致表明,家族之间的突变率没有差异。这里估计的孔雀鱼突变率是脊椎动物中直接估计的最低突变率之一;然而,之前的研究也发现其他硬骨鱼类的估计突变率较低。我们讨论了这种模式的潜在解释,以及机器学习方法的未来应用和局限性。