Semeria Magali, Tannier Eric, Guéguen Laurent
BMC Bioinformatics. 2015;16 Suppl 14(Suppl 14):S5. doi: 10.1186/1471-2105-16-S14-S5. Epub 2015 Oct 2.
Most models of genome evolution concern either genetic sequences, gene content or gene order. They sometimes integrate two of the three levels, but rarely the three of them. Probabilistic models of gene order evolution usually have to assume constant gene content or adopt a presence/absence coding of gene neighborhoods which is blind to complex events modifying gene content.
We propose a probabilistic evolutionary model for gene neighborhoods, allowing genes to be inserted, duplicated or lost. It uses reconciled phylogenies, which integrate sequence and gene content evolution. We are then able to optimize parameters such as phylogeny branch lengths, or probabilistic laws depicting the diversity of susceptibility of syntenic regions to rearrangements. We reconstruct a structure for ancestral genomes by optimizing a likelihood, keeping track of all evolutionary events at the level of gene content and gene synteny. Ancestral syntenies are associated with a probability of presence.
大多数基因组进化模型关注的是遗传序列、基因内容或基因顺序。它们有时会整合这三个层面中的两个,但很少将三个层面都整合起来。基因顺序进化的概率模型通常必须假定基因内容恒定,或者采用对修改基因内容的复杂事件视而不见的基因邻域存在/缺失编码。
我们提出了一种基因邻域的概率进化模型,允许基因插入、复制或丢失。它使用了协调系统发育树,该系统发育树整合了序列和基因内容的进化。然后,我们能够优化诸如系统发育分支长度等参数,或描述同线区域对重排敏感性多样性的概率定律。我们通过优化似然性来重建祖先基因组的结构,跟踪基因内容和基因同线性水平上的所有进化事件。祖先同线性与存在概率相关联。