Department of Biology, University of Florida.
Department of Biology, Duke University.
Genome Biol Evol. 2018 Nov 1;10(11):2882-2898. doi: 10.1093/gbe/evy200.
Genomic data have provided evidence of previously unknown ancient whole genome duplications (WGDs) and highlighted the role of WGDs in the evolution of many eukaryotic lineages. Ancient WGDs often are detected by examining distributions of synonymous substitutions per site (Ks) within a genome, or "Ks plots." For example, WGDs can be detected from Ks plots by using univariate mixture models to identify peaks in Ks distributions. We performed gene family simulation experiments to evaluate the effects of different Ks estimation methods and mixture models on our ability to detect ancient WGDs from Ks plots. The simulation experiments, which accounted for variation in substitution rates and gene duplication and loss rates across gene families, tested the effects of WGD age and gene retention rates following WGD on inferring WGDs from Ks plots. Our simulations reveal limitations of Ks plot analyses. Strict interpretations of mixture model analyses often overestimate the number of WGD events, and Ks plot analyses typically fail to detect WGDs when ≤10% of the duplicated genes are retained following the WGD. However, WGDs can accurately be characterized over an intermediate range of Ks. The simulation results are supported by empirical analyses of transcriptomic data, which also suggest that biases in gene retention likely affect our ability to detect ancient WGDs. Although our results indicate mixture model results should be interpreted with great caution, using node-averaged Ks estimates and applying more appropriate mixture models can improve the accuracy of detecting WGDs.
基因组数据为先前未知的全基因组复制(WGD)提供了证据,并强调了 WGD 在许多真核生物谱系进化中的作用。通过检查基因组内每个位点同义替换数(Ks)的分布,或者“Ks 图”,可以检测到古老的 WGD。例如,可以通过使用单变量混合模型来识别 Ks 分布中的峰值,从 Ks 图中检测到 WGD。我们进行了基因家族模拟实验,以评估不同的 Ks 估计方法和混合模型对我们从 Ks 图中检测古老 WGD 的能力的影响。模拟实验考虑了替代率和基因复制和丢失率在基因家族之间的变化,测试了 WGD 年龄和 WGD 后基因保留率对从 Ks 图推断 WGD 的影响。我们的模拟揭示了 Ks 图分析的局限性。混合模型分析的严格解释通常会高估 WGD 事件的数量,并且当 WGD 后保留的重复基因≤10%时,Ks 图分析通常无法检测到 WGD。然而,在 Ks 的中间范围内可以准确地描述 WGD。转录组数据的实证分析也支持模拟结果,这表明基因保留的偏差可能会影响我们检测古老 WGD 的能力。尽管我们的结果表明,混合模型的结果应该谨慎解释,但使用节点平均 Ks 估计值和应用更合适的混合模型可以提高检测 WGD 的准确性。