Suppr超能文献

波拉特:基于软掩模图网络和残基标签注意力的蛋白质功能预测。

POLAT: Protein function prediction based on soft mask graph network and residue-Label ATtention.

机构信息

Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.

出版信息

Comput Biol Chem. 2024 Jun;110:108064. doi: 10.1016/j.compbiolchem.2024.108064. Epub 2024 Apr 18.

Abstract

MOTIVATION

Elucidating protein function is a central problem in biochemistry, genetics, and molecular biology. Developing computational methods for protein function prediction is critical due to the significant gap between sequence and functional data. Recent advances in protein structure prediction, which strongly correlates with function, make it feasible to use structure to predict function. However, current structure-based methods overlook the fact that individual residues may contribute differently to the protein's function and do not take into account the correlation between protein residues and their functions. The challenge of effectively utilizing the relationship between protein residues and function-level information to predict protein function remains unsolved.

RESULT

We proposed a protein function prediction method based on Soft Mask Graph Networks and Residue-Label Attention (POLAT), which could combine sequence features, predicted structure features, and function-level information to get an accurate prediction. We use soft mask graph networks to adaptively extract the residues relevant to functions. A residue-label attention mechanism is adopted to obtain the protein-level encoded features of a protein, which are then concatenated with a protein-level embedding and fed into a dense classifier to determine the probabilities of each function. POLAT achieves 0.670, 0.515, 0.578 Fmax and 0.677, 0.409, 0.507 AUPR on the PDB cdhit test set for the MFO, BPO, and CCO domains, respectively, outperforming the existing structure-based SOTA method GAT-GO (Fmax 0.633, 0.492, 0.547; AUPR 0.660, 0.381, 0.479). POLAT is also competitive in extensive experiments among sequence-based and multimodal methods and achieves the SOTA performance in three out of six metrics.

摘要

动机

阐明蛋白质功能是生物化学、遗传学和分子生物学的核心问题。由于序列和功能数据之间存在显著差距,开发用于蛋白质功能预测的计算方法至关重要。蛋白质结构预测的最新进展与功能密切相关,使得使用结构来预测功能成为可能。然而,当前基于结构的方法忽略了这样一个事实,即单个残基可能对蛋白质的功能有不同的贡献,并且没有考虑到蛋白质残基与其功能之间的相关性。有效地利用蛋白质残基与功能级信息之间的关系来预测蛋白质功能的挑战仍然没有得到解决。

结果

我们提出了一种基于软掩模图网络和残基标签注意力(POLAT)的蛋白质功能预测方法,该方法可以结合序列特征、预测结构特征和功能级信息,从而得到准确的预测。我们使用软掩模图网络自适应地提取与功能相关的残基。采用残基标签注意力机制获取蛋白质的编码特征,然后将其与蛋白质级别的嵌入拼接,并将其输入密集分类器,以确定每个功能的概率。在 PDB cdhit 测试集上,对于 MFO、BPO 和 CCO 结构域,POLAT 分别实现了 0.670、0.515、0.578 的 Fmax 和 0.677、0.409、0.507 的 AUPR,优于现有的基于结构的 SOTA 方法 GAT-GO(Fmax 0.633、0.492、0.547;AUPR 0.660、0.381、0.479)。在序列和多模态方法的广泛实验中,POLAT 也具有竞争力,并在六个指标中的三个达到了 SOTA 性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验