InterLabelGO+：揭示蛋白质功能预测中标签相关性。

InterLabelGO+: unraveling label correlations in protein function prediction.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.

Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae655.

DOI:10.1093/bioinformatics/btae655

Abstract

MOTIVATION

Accurate protein function prediction is crucial for understanding biological processes and advancing biomedical research. However, the rapid growth of protein sequences far outpaces the experimental characterization of their functions, necessitating the development of automated computational methods.

RESULTS

We present InterLabelGO+, a hybrid approach that integrates a deep learning-based method with an alignment-based method for improved protein function prediction. InterLabelGO+ incorporates a novel loss function that addresses label dependency and imbalance and further enhances performance through dynamic weighting of the alignment-based component. A preliminary version of InterLabelGO+ achieved a strong performance in the CAFA5 challenge, ranking sixth out of 1625 participating teams. Comprehensive evaluations on large-scale protein function prediction tasks demonstrate InterLabelGO+'s ability to accurately predict Gene Ontology terms across various functional categories and evaluation metrics.

AVAILABILITY AND IMPLEMENTATION

The source code and datasets for InterLabelGO+ are freely available on GitHub at https://github.com/QuanEvans/InterLabelGO. A web-server is available at https://seq2fun.dcmb.med.umich.edu/InterLabelGO/. The software is implemented in Python and PyTorch, and is supported on Linux and macOS.

摘要

动机

准确的蛋白质功能预测对于理解生物过程和推进生物医学研究至关重要。然而，蛋白质序列的快速增长远远超过了其功能的实验表征，因此需要开发自动化的计算方法。

结果

我们提出了 InterLabelGO+，这是一种混合方法，它将基于深度学习的方法与基于比对的方法相结合，以提高蛋白质功能预测的准确性。InterLabelGO+采用了一种新颖的损失函数，解决了标签依赖性和不平衡问题，并通过动态加权比对方法进一步提高了性能。InterLabelGO+的初步版本在 CAFA5 挑战赛中表现出色，在 1625 个参赛团队中排名第六。在大规模蛋白质功能预测任务上的综合评估表明，InterLabelGO+能够准确预测各种功能类别和评估指标的基因本体术语。