Suppr超能文献

深度学习语言模型和变换网络在蛋白质二级结构预测中的改进。

Improving Protein Secondary Structure Prediction by Deep Language Models and Transformer Networks.

机构信息

Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, USA.

Department of Chemistry, Hubei University, Wuhan, Hubei, China.

出版信息

Methods Mol Biol. 2025;2867:43-53. doi: 10.1007/978-1-0716-4196-5_3.

Abstract

Protein secondary structure prediction is useful for many applications. It can be considered a language translation problem, that is, translating a sequence of 20 different amino acids into a sequence of secondary structure symbols (e.g., alpha helix, beta strand, and coil). Here, we develop a novel protein secondary structure predictor called TransPross based on the transformer network and attention mechanism widely used in natural language processing to directly extract the evolutionary information from the protein language (i.e., raw multiple sequence alignment [MSA] of a protein) to predict the secondary structure. The method is different from traditional methods that first generate a MSA and then calculate expert-curated statistical profiles from the MSA as input. The attention mechanism used by TransPross can effectively capture long-range residue-residue interactions in protein sequences to predict secondary structures. Benchmarked on several datasets, TransPross outperforms the state-of-art methods. Moreover, our experiment shows that the prediction accuracy of TransPross positively correlates with the depth of MSAs, and it is able to achieve the average prediction accuracy (i.e., Q3 score) above 80% for hard targets with few homologous sequences in their MSAs. TransPross is freely available at https://github.com/BioinfoMachineLearning/TransPro .

摘要

蛋白质二级结构预测在许多应用中都很有用。它可以被视为一种语言翻译问题,即将 20 种不同氨基酸的序列转换为二级结构符号的序列(例如,α螺旋、β链和无规卷曲)。在这里,我们基于广泛应用于自然语言处理的转换器网络和注意力机制,开发了一种名为 TransPross 的新型蛋白质二级结构预测器,直接从蛋白质语言(即蛋白质的原始多重序列比对(MSA))中提取进化信息,以预测二级结构。该方法与传统方法不同,传统方法首先生成 MSA,然后从 MSA 计算专家编制的统计分布作为输入。TransPross 中使用的注意力机制可以有效地捕获蛋白质序列中的长程残基-残基相互作用,从而预测二级结构。在几个数据集上进行的基准测试表明,TransPross 优于最先进的方法。此外,我们的实验表明,TransPross 的预测准确性与 MSA 的深度呈正相关,并且它能够在 MSA 中同源序列较少的硬目标上实现平均预测准确性(即 Q3 得分)超过 80%。TransPross 可在 https://github.com/BioinfoMachineLearning/TransPro 上免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验