Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China.
Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China; AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan; Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan.
Int J Biol Macromol. 2024 May;267(Pt 1):131311. doi: 10.1016/j.ijbiomac.2024.131311. Epub 2024 Apr 9.
In the rapidly evolving field of computational biology, accurate prediction of protein secondary structures is crucial for understanding protein functions, facilitating drug discovery, and advancing disease diagnostics. In this paper, we propose MFTrans, a deep learning-based multi-feature fusion network aimed at enhancing the precision and efficiency of Protein Secondary Structure Prediction (PSSP). This model employs a Multiple Sequence Alignment (MSA) Transformer in combination with a multi-view deep learning architecture to effectively capture both global and local features of protein sequences. MFTrans integrates diverse features generated by protein sequences, including MSA, sequence information, evolutionary information, and hidden state information, using a multi-feature fusion strategy. The MSA Transformer is utilized to interleave row and column attention across the input MSA, while a Transformer encoder and decoder are introduced to enhance the extracted high-level features. A hybrid network architecture, combining a convolutional neural network with a bidirectional Gated Recurrent Unit (BiGRU) network, is used to further extract high-level features after feature fusion. In independent tests, our experimental results show that MFTrans has superior generalization ability, outperforming other state-of-the-art PSSP models by 3 % on average on public benchmarks including CASP12, CASP13, CASP14, TEST2016, TEST2018, and CB513. Case studies further highlight its advanced performance in predicting mutation sites. MFTrans contributes significantly to the protein science field, opening new avenues for drug discovery, disease diagnosis, and protein.
在快速发展的计算生物学领域,准确预测蛋白质二级结构对于理解蛋白质功能、促进药物发现和推进疾病诊断至关重要。在本文中,我们提出了 MFTrans,这是一个基于深度学习的多特征融合网络,旨在提高蛋白质二级结构预测(PSSP)的精度和效率。该模型采用多序列比对(MSA)Transformer 与多视图深度学习架构相结合,有效捕捉蛋白质序列的全局和局部特征。MFTrans 通过多特征融合策略集成了由蛋白质序列生成的多种特征,包括 MSA、序列信息、进化信息和隐藏状态信息。MSA Transformer 用于在输入 MSA 中跨行和列注意力交织,而 Transformer 编码器和解码器用于增强提取的高级特征。混合网络架构结合了卷积神经网络和双向门控循环单元(BiGRU)网络,在特征融合后进一步提取高级特征。在独立测试中,我们的实验结果表明,MFTrans 具有卓越的泛化能力,在包括 CASP12、CASP13、CASP14、TEST2016、TEST2018 和 CB513 在内的公共基准上,平均比其他最先进的 PSSP 模型高出 3%。案例研究进一步强调了它在预测突变位点方面的先进性能。MFTrans 为蛋白质科学领域做出了重要贡献,为药物发现、疾病诊断和蛋白质研究开辟了新的途径。