Catanzaro Daniele, Pesenti Raffaele, Milinkovitch Michel C
Laboratory of Evolutionary Genetics, Institute for Molecular Biology and Medicine, IBMM, Université Libre de Bruxelles CP300, Rue Jeener et Brachet 12, B-6041, Gosselies, Belgium.
Bioinformatics. 2006 Mar 15;22(6):708-15. doi: 10.1093/bioinformatics/btk001. Epub 2006 Jan 5.
The general-time-reversible (GTR) model is one of the most popular models of nucleotide substitution because it constitutes a good trade-off between mathematical tractability and biological reality. However, when it is applied for inferring evolutionary distances and/or instantaneous rate matrices, the GTR model seems more prone to inapplicability than more restrictive time-reversible models. Although it has been previously noted that the causes for intractability are caused by the impossibility of computing the logarithm of a matrix characterised by negative eigenvalues, the issue has not been investigated further.
Here, we formally characterize the mathematical conditions, and discuss their biological interpretation, which lead to the inapplicability of the GTR model. We investigate the relations between, on one hand, the occurrence of negative eigenvalues and, on the other hand, both sequence length and sequence divergence. We then propose a possible re-formulation of previous procedures in terms of a non-linear optimization problem. We analytically investigate the effect of our approach on the estimated evolutionary distances and transition probability matrix. Finally, we provide an analysis on the goodness of the solution we propose. A numerical example is discussed.
通用时间可逆(GTR)模型是最流行的核苷酸替换模型之一,因为它在数学易处理性和生物学现实之间构成了良好的平衡。然而,当将其应用于推断进化距离和/或瞬时速率矩阵时,GTR模型似乎比更具限制性的时间可逆模型更容易出现不适用性。尽管之前已经指出,难处理性的原因是由于无法计算具有负特征值的矩阵的对数,但该问题尚未得到进一步研究。
在此,我们正式描述了导致GTR模型不适用性的数学条件,并讨论了它们的生物学解释。我们研究了一方面负特征值的出现与另一方面序列长度和序列分歧之间的关系。然后,我们根据非线性优化问题提出了对先前程序的一种可能的重新表述。我们分析研究了我们的方法对估计的进化距离和转移概率矩阵的影响。最后,我们对我们提出的解决方案的优劣进行了分析。讨论了一个数值示例。