Dong Benzhi, Liu Zheng, Xu Dali, Hou Chang, Dong Guanghui, Zhang Tianjiao, Wang Guohua
College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.
Comput Struct Biotechnol J. 2024 Mar 22;23:1364-1375. doi: 10.1016/j.csbj.2024.03.018. eCollection 2024 Dec.
Protein secondary structure prediction (PSSP) is a pivotal research endeavour that plays a crucial role in the comprehensive elucidation of protein functions and properties. Current prediction methodologies are focused on deep-learning techniques, particularly focusing on multi-factor features. Diverging from existing approaches, in this study, we placed special emphasis on the effects of amino acid properties and protein secondary structure propensity scores (SSPs) on secondary structure during the meticulous selection of multi-factor features. This differential feature-selection strategy results in a distinctive and effective amalgamation of the sequence and property features. To harness these multi-factor features optimally, we introduced a hybrid deep feature extraction model. The model initially employs mechanisms such as dilated convolution (D-Conv) and a channel attention network (SENet) for local feature extraction and targeted channel enhancement. Subsequently, a combination of recurrent neural network variants (BiGRU and BiLSTM), along with a transformer module, was employed to achieve global bidirectional information consideration and feature enhancement. This approach to multi-factor feature input and multi-level feature processing enabled a comprehensive exploration of intricate associations among amino acid residues in protein sequences, yielding a accuracy of 84.9% and an Sov score of 85.1%. The overall performance surpasses that of the comparable methods. This study introduces a novel and efficient method for determining the PSSP domain, which is poised to deepen our understanding of the practical applications of protein molecular structures.
蛋白质二级结构预测(PSSP)是一项关键的研究工作,在全面阐明蛋白质功能和特性方面发挥着至关重要的作用。当前的预测方法主要集中在深度学习技术上,尤其侧重于多因素特征。与现有方法不同,在本研究中,我们在精心选择多因素特征的过程中,特别强调了氨基酸特性和蛋白质二级结构倾向得分(SSP)对二级结构的影响。这种差异化的特征选择策略导致了序列特征和属性特征的独特而有效的融合。为了最佳地利用这些多因素特征,我们引入了一种混合深度特征提取模型。该模型最初采用扩张卷积(D-Conv)和通道注意力网络(SENet)等机制进行局部特征提取和有针对性的通道增强。随后,采用循环神经网络变体(BiGRU和BiLSTM)与变压器模块相结合的方式,实现全局双向信息考虑和特征增强。这种多因素特征输入和多层次特征处理的方法能够全面探索蛋白质序列中氨基酸残基之间的复杂关联,准确率达到84.9%,Sov得分达到85.1%。整体性能超过了可比方法。本研究介绍了一种新颖且高效的确定PSSP域的方法,有望加深我们对蛋白质分子结构实际应用的理解。