Suppr超能文献

MHTAPred-SS:一种用于准确蛋白质二级结构预测的高度靶向的自动编码器驱动的深度多任务学习框架。

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction.

作者信息

Feng Runqiu, Wang Xun, Xia Zhijun, Han Tongyu, Wang Hanyu, Yu Wenqian

机构信息

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China.

出版信息

Int J Mol Sci. 2024 Dec 15;25(24):13444. doi: 10.3390/ijms252413444.

Abstract

Accurate protein secondary structure prediction (PSSP) plays a crucial role in biopharmaceutics and disease diagnosis. Current prediction methods are mainly based on multiple sequence alignment (MSA) encoding and collaborative operations of diverse networks. However, existing encoding approaches lead to poor feature space utilization, and encoding quality decreases with fewer homologous proteins. Moreover, the performance of simple stacked networks is greatly limited by feature extraction capabilities and learning strategies. To this end, we propose MHTAPred-SS, a novel PSSP framework based on the fusion of six features, including the embedding feature derived from a pre-trained protein language model. First, we propose a highly targeted autoencoder (HTA) as the driver to encode sequences in a homologous protein-independent manner. Second, under the guidance of biological knowledge, we design a protein secondary structure prediction model based on the multi-task learning strategy (PSSP-MTL). Experimental results on six independent test sets show that MHTAPred-SS achieves state-of-the-art performance, with values of 88.14%, 84.89%, 78.74% and 77.15% for Q3, SOV3, Q8 and SOV8 metrics on the TEST2016 dataset, respectively. Additionally, we demonstrate that MHTAPred-SS has significant advantages in single-category and boundary secondary structure prediction, and can finely capture the distribution of secondary structure segments, thereby contributing to subsequent tasks.

摘要

准确的蛋白质二级结构预测(PSSP)在生物制药和疾病诊断中起着至关重要的作用。当前的预测方法主要基于多序列比对(MSA)编码和不同网络的协同操作。然而,现有的编码方法导致特征空间利用率低下,并且随着同源蛋白质数量的减少,编码质量会下降。此外,简单堆叠网络的性能受到特征提取能力和学习策略的极大限制。为此,我们提出了MHTAPred-SS,这是一种基于六种特征融合的新型PSSP框架,其中包括源自预训练蛋白质语言模型的嵌入特征。首先,我们提出了一种高度针对性的自动编码器(HTA)作为驱动程序,以独立于同源蛋白质的方式对序列进行编码。其次,在生物学知识的指导下,我们设计了一种基于多任务学习策略的蛋白质二级结构预测模型(PSSP-MTL)。在六个独立测试集上的实验结果表明,MHTAPred-SS取得了领先的性能,在TEST2016数据集上,Q3、SOV3、Q8和SOV8指标的值分别为88.14%、84.89%、78.74%和77.15%。此外,我们证明MHTAPred-SS在单类别和边界二级结构预测方面具有显著优势,并且可以精细地捕捉二级结构片段的分布,从而有助于后续任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbd3/11677681/2c6e969b5947/ijms-25-13444-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验