Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
J Mol Biol. 2021 Oct 1;433(20):167106. doi: 10.1016/j.jmb.2021.167106. Epub 2021 Jun 15.
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
传统的序列分析算法在探测范围之外无法识别遥远的同源性。在这篇综述中,我们讨论了基于共进化的接触和距离预测方法如何将同源性检测的范围推回,从而产生新的功能见解和可实验验证的假设。基于相关替换,这些方法推断出蛋白质序列中氨基酸之间的三维约束,而这些氨基酸以前没有被注释为所有结构域和重复序列。这些新算法可以在原本没有任何特征的序列景观中发现隐藏的结构。它们的启示性影响有望与考古学家使用地面穿透雷达来识别长期隐藏的地下结构一样深远。作为这方面的例子,我们描述了 MON1A 样蛋白中的长因域反映出的三倍体结构,或 DISC1 中的 UVR 样重复结构如何从它们的预测接触和距离图中出现。这些方法还有助于解决不符合蛋白质结构域“串珠式”模型的结构。在一个这样的例子中,我们描述了 CFAP298,其泛素样结构域以前由于其内部的一个大序列插入而难以感知。更一般地说,新的算法允许更容易理解家族和折叠的结构域,其进化涉及结构插入或重排。正如我们用α1-抗胰蛋白酶所举例的那样,基于共进化的预测接触也可能提供关于蛋白质动力学和构象变化的见解。这种结构预测(使用创新的基于共进化的方法)和同源性推断(使用更传统的序列分析方法)的新组合为我们带来了更广阔的进化关系视角,这些关系以前远在同源性检测范围之外。