Zhang Yue, Zheng Chunfang, Sankoff David
Department of Mathematics and Statistics, University of Ottawa, 150 Louis Pasteur Pvt, Ottawa, K1N 6N5 Canada.
Algorithms Mol Biol. 2019 Aug 1;14:18. doi: 10.1186/s13015-019-0153-8. eCollection 2019.
The statistical distribution of the similarity or difference between pairs of paralogous genes, created by whole genome doubling, or between pairs of orthologous genes in two related species is an important source of information about genomic evolution, especially in plants.
We derive the mixture of distributions of sequence similarity for duplicate gene pairs generated by repeated episodes of whole gene doubling. This involves integrating sequence divergence and gene pair loss through fractionation, using a branching process and a mutational model. We account not only for the timing of these events in terms of local modes, but also the amplitude and variance of the component distributions. This model is then extended to orthologous gene pairs.
We apply the model and inference procedures to the evolution of the Solanaceae, focusing on the genomes of economically important crops. We assess how consistent or variable fractionation rates are from species to species and over time.
由全基因组加倍产生的旁系同源基因对之间,或两个相关物种的直系同源基因对之间的相似性或差异的统计分布,是有关基因组进化的重要信息来源,尤其是在植物中。
我们推导了由全基因多次加倍产生的重复基因对的序列相似性分布的混合情况。这涉及通过使用分支过程和突变模型,通过分馏整合序列分歧和基因对丢失。我们不仅根据局部模式考虑这些事件的时间,还考虑成分分布的幅度和方差。然后将该模型扩展到直系同源基因对。
我们将该模型和推理程序应用于茄科植物的进化,重点关注经济上重要作物的基因组。我们评估了不同物种之间以及随时间推移分馏率的一致性或变异性。