Boadu Frimpong, Lee Ahhyun, Cheng Jianlin
Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, USA.
Methods Mol Biol. 2025;2941:101-111. doi: 10.1007/978-1-0716-4623-6_6.
Experimentally determining the functions of proteins is a complex and time-consuming process. This challenge contributes to a gap, where many proteins have known sequences, predicted structures, and other crucial information, yet lack functional annotations. This gap underscores the critical importance of automated function prediction (AFP) methods, which aim to develop computational techniques dedicated to predicting protein functions. Most AFP methods leverage the wealth of diverse protein information available, such as sequences, structures, protein-protein interactions, and domain characteristics. These methods often utilize individual features or integrate multiple features to enhance the accuracy of function prediction. In this chapter, we focus on TransFun, a structure-based protein function prediction technique. TransFun leverages the embeddings provided by the ESM-1b pretrained protein language models to distill intricate sequence features and combines them with AlphaFold's predicted structures to predict protein functions. Availability: https://github.com/jianlin-cheng/TransFun.
通过实验确定蛋白质的功能是一个复杂且耗时的过程。这一挑战导致了一个差距,即许多蛋白质具有已知序列、预测结构和其他关键信息,但缺乏功能注释。这个差距凸显了自动功能预测(AFP)方法的至关重要性,这些方法旨在开发专门用于预测蛋白质功能的计算技术。大多数AFP方法利用现有的丰富多样的蛋白质信息,如序列、结构、蛋白质-蛋白质相互作用和结构域特征。这些方法通常利用单个特征或整合多个特征来提高功能预测的准确性。在本章中,我们重点介绍TransFun,一种基于结构的蛋白质功能预测技术。TransFun利用ESM-1b预训练蛋白质语言模型提供的嵌入来提取复杂的序列特征,并将它们与AlphaFold预测的结构相结合,以预测蛋白质功能。可用性:https://github.com/jianlin-cheng/TransFun 。