Department of Biochemistry, University of Oxford, Oxford, UK.
AmoAi Technologies, Oxford, UK.
Nat Commun. 2024 Aug 4;15(1):6601. doi: 10.1038/s41467-024-50955-0.
Understanding protein function is pivotal in comprehending the intricate mechanisms that underlie many crucial biological activities, with far-reaching implications in the fields of medicine, biotechnology, and drug development. However, more than 200 million proteins remain uncharacterized, and computational efforts heavily rely on protein structural information to predict annotations of varying quality. Here, we present a method that utilizes statistics-informed graph networks to predict protein functions solely from its sequence. Our method inherently characterizes evolutionary signatures, allowing for a quantitative assessment of the significance of residues that carry out specific functions. PhiGnet not only demonstrates superior performance compared to alternative approaches but also narrows the sequence-function gap, even in the absence of structural information. Our findings indicate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting both existing properties and new functionalities of proteins in research and biomedicine.
理解蛋白质功能对于理解许多关键生物活动背后的复杂机制至关重要,这在医学、生物技术和药物开发等领域具有深远的影响。然而,仍有超过 2 亿种蛋白质尚未被描述,计算工作主要依赖于蛋白质结构信息来预测不同质量的注释。在这里,我们提出了一种仅从序列利用统计信息图网络来预测蛋白质功能的方法。我们的方法内在地描述了进化特征,允许对执行特定功能的残基的重要性进行定量评估。与其他方法相比,PhiGnet 不仅表现出优越的性能,而且即使在没有结构信息的情况下,也能缩小序列-功能差距。我们的研究结果表明,将深度学习应用于进化数据可以突出残基水平上的功能位点,为解释蛋白质在研究和生物医学中的现有特性和新功能提供有价值的支持。