Malinverni Duccio, Barducci Alessandro
Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB20QH, UK.
Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France.
Entropy (Basel). 2020 Jan 23;21(11):1127. doi: 10.3390/e21111127. Epub 2019 Nov 16.
Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.
近年来,从序列共变中提取结构信息已成为常见的计算生物学实践,这主要归功于蛋白质家族大序列比对数据的可得性。然而,使用基于序列的方法识别特定于亚类且不为家族所有成员共享的特征,仍然是一个难以解决的问题。我们在此提出一种基于协同进化的方法,通过连续序列重加权(SR)方法来差异分析亚家族特异性结构特征。我们介绍了其基本原理,并在响应调节因子家族上测试了其预测能力,该家族的亚家族先前已显示出不同的、特定的同二聚化模式。我们的结果表明,即使序列数据相对较少,这种重加权方案也能有效地将先验已知的结构特征分配给亚家族。此外,序列重加权允许评估单个结构接触是否属于特定亚家族,从而为从序列变异数据中识别决定特异性的接触铺平了道路。