Department of Biology, Aix-Marseille Université and INSERM UMR_S 1072, Marseille, France.
Biomol Concepts. 2022 Feb 21;13(1):55-60. doi: 10.1515/bmc-2022-0006.
Accurate prediction of protein structure is one of the most challenging goals of biology. The most recent achievement is AlphaFold, a machine learning method that has claimed to have solved the structure of almost all human proteins. This technological breakthrough has been compared to the sequencing of the human genome. However, this triumphal statement should be treated with caution, as we identified serious flaws in some AlphaFold models. Disordered regions are often represented by large loops that clash with the overall protein geometry, leading to unrealistic structures, especially for membrane proteins. In fact, AlphaFold comes up against the notion that protein folding is not solely determined by genomic information. We suggest that all parameters controlling the structure of a protein without being strictly encoded in its amino acid sequence should be coined "epigenetic dimension of protein structure." Such parameters include for instance protein solvation by membrane lipids, or the structuration of disordered proteins upon ligand binding, but exclude sequence-encoded sites of post-translational modifications such as glycosylation. In our view, this paradigm is necessary to reconcile two opposite properties of living systems: beyond rigorous biological coding, evolution has given way to a certain level of uncertainty and anarchy.
准确预测蛋白质结构是生物学中最具挑战性的目标之一。最近的一项成就是 AlphaFold,这是一种机器学习方法,据称已经解决了几乎所有人类蛋白质的结构问题。这一技术突破堪比人类基因组测序。然而,这种胜利的声明应该谨慎对待,因为我们发现了一些 AlphaFold 模型中的严重缺陷。无规则区域通常由与整体蛋白质几何形状冲突的大环表示,导致不现实的结构,特别是对于膜蛋白。事实上,AlphaFold 与蛋白质折叠不仅仅由基因组信息决定的观点相悖。我们建议将所有控制蛋白质结构但不在其氨基酸序列中严格编码的参数称为“蛋白质结构的表观遗传维度”。此类参数包括例如膜脂质对蛋白质的溶剂化作用,或配体结合时无序蛋白质的结构化作用,但不包括翻译后修饰(如糖基化)的序列编码位点。在我们看来,这种范例对于协调生命系统的两个相反特性是必要的:超越严格的生物编码,进化已经让位于一定程度的不确定性和混乱。