Department of Computational and Quantitative Medicine and Diabetes and Metabolism Research Institute, Beckman Research Institute of the City of Hope, 1500 East Duarte Road, Duarte, CA, 91010-3000, USA.
J Mol Evol. 2019 Jul;87(4-6):184-198. doi: 10.1007/s00239-019-09899-z. Epub 2019 Jul 13.
Recent developments in sequencing and growth of bioinformatics resources provide us with vast depositories of protein network and single nucleotide polymorphism data. It allows us to re-examine, on a larger and more comprehensive scale, the relationship between protein-protein interactions and protein variability and evolutionary rates. This relationship has remained far from unambiguously resolved for quite a long time, reflecting shifting analysis approaches in the literature, and growing data availability. In this study, we utilized several public genomic databases to investigate this relationship in human, mouse, pig, chicken, and zebrafish. We observed strong non-linear relationship patterns (tending towards convex decreasing function shapes) between protein variability and the density of corresponding protein-protein interactions across all five species. To investigate further, we carried out stochastic simulations, modeling the interplay between protein connectivity and variability. Our results indicate that a simple negative linear correlation model, often suggested (or tacitly assumed) in the literature, as either a null or an alternative hypothesis, is not a good fit with the observed data. After considering different (but still relatively simple, and not overfitting) simulation models, we found that a convex decreasing protein variability-connectivity function (specifically, exponential decay) led to a much better fit with the real data. We conclude that simple correlation models might be inadequate for describing protein variability-connectivity interplay in vertebrates; they often tend towards false negatives (showing no more than marginal linear or rank correlation where there are in fact strong non-random patterns).
近年来,测序和生物信息学资源的发展为我们提供了大量的蛋白质网络和单核苷酸多态性数据存储库。这使我们能够在更大、更全面的范围内重新审视蛋白质-蛋白质相互作用与蛋白质变异性和进化速率之间的关系。这种关系在很长一段时间内都没有得到明确的解决,反映了文献中分析方法的变化和数据可用性的增加。在这项研究中,我们利用几个公共基因组数据库,研究了人类、小鼠、猪、鸡和斑马鱼中这种关系。我们观察到,在所有五个物种中,蛋白质变异性与相应蛋白质-蛋白质相互作用密度之间存在强烈的非线性关系模式(倾向于凸递减函数形状)。为了进一步研究,我们进行了随机模拟,模拟蛋白质连接性和变异性之间的相互作用。我们的结果表明,简单的负线性相关模型(在文献中经常被提出(或默认)为零假设或替代假设)与观察到的数据不太吻合。在考虑了不同的(但仍然相对简单、不过拟合)模拟模型之后,我们发现,凸递减的蛋白质变异性-连接性函数(具体来说,指数衰减)与真实数据的拟合度更好。我们得出结论,简单的相关模型可能不足以描述脊椎动物中蛋白质变异性-连接性相互作用;它们往往倾向于假阴性(显示出的线性或等级相关性微不足道,而实际上存在强烈的非随机模式)。