Department of Cell Research and Immunology, School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
Bioinformatics. 2019 Aug 1;35(15):2562-2568. doi: 10.1093/bioinformatics/bty1031.
Ancestral sequence reconstruction (ASR) is widely used to understand protein evolution, structure and function. Current ASR methodologies do not fully consider differences in evolutionary constraints among positions imposed by the three-dimensional (3D) structure of the protein. Here, we developed an ASR algorithm that allows different protein sites to evolve according to different mixtures of replacement matrices. We show that assigning replacement matrices to protein positions based on their solvent accessibility leads to ASR with higher log-likelihoods compared to naïve models that assume a single replacement matrix for all sites. Improved ASR log-likelihoods are also demonstrated when solvent accessibility is predicted from protein sequences rather than inferred from a known 3D structure. Finally, we show that using such structure-aware mixture models results in substantial differences in the inferred ancestral sequences.
Supplementary data are available at Bioinformatics online.
祖先序列重建(ASR)被广泛用于理解蛋白质的进化、结构和功能。目前的 ASR 方法并没有充分考虑蛋白质三维结构所施加的位置进化约束的差异。在这里,我们开发了一种 ASR 算法,允许不同的蛋白质位点根据不同的替换矩阵混合物进行进化。我们表明,根据溶剂可及性将替换矩阵分配给蛋白质位置会导致 ASR 的对数似然度高于假设所有位置使用单个替换矩阵的简单模型。当从蛋白质序列而不是从已知的 3D 结构推断溶剂可及性时,也可以证明 ASR 的对数似然度得到了提高。最后,我们表明,使用这种结构感知的混合模型会导致推断出的祖先序列有很大的差异。
补充数据可在生物信息学在线获得。