Suppr超能文献

将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.

机构信息

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States.

Department of Statistics, Florida State University, Tallahassee, FL 32306, Unites States.

出版信息

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.

Abstract

MOTIVATION

Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently.

RESULTS

We developed TransFun-a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy.

AVAILABILITY AND IMPLEMENTATION

The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.

摘要

动机

大量基因组和转录组测序项目产生了数以百万计的蛋白质序列。然而,实验确定蛋白质的功能仍然是一个耗时、低通量和昂贵的过程,导致蛋白质序列-功能之间存在很大差距。因此,开发能够准确预测蛋白质功能的计算方法来填补这一空白非常重要。尽管已经开发了许多使用蛋白质序列作为输入来预测功能的方法,但利用蛋白质结构进行蛋白质功能预测的方法要少得多,因为直到最近,大多数蛋白质都缺乏准确的蛋白质结构。

结果

我们开发了 TransFun——一种使用基于转换器的蛋白质语言模型和 3D 等变图神经网络的方法,从蛋白质序列和结构中提取信息,以预测蛋白质功能。它通过迁移学习使用预先训练的蛋白质语言模型 (ESM) 从蛋白质序列中提取特征嵌入,并通过等变图神经网络将其与由 AlphaFold2 预测的蛋白质 3D 结构结合起来。在 CAFA3 测试数据集和新的测试数据集上进行基准测试,TransFun 优于几种最先进的方法,表明语言模型和 3D 等变图神经网络是利用蛋白质序列和结构来提高蛋白质功能预测的有效方法。结合 TransFun 预测和基于序列相似性的预测可以进一步提高预测准确性。

可用性和实现

TransFun 的源代码可在 https://github.com/jianlin-cheng/TransFun 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3dd/10311302/0d67a2483b44/btad208f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验