Suppr超能文献

结合知识蒸馏与神经网络预测蛋白质二级结构。

Combining knowledge distillation and neural networks to predict protein secondary structure.

作者信息

Zhao Lufei, Li Jingyi, Zhang Biao, Jiang Xuchu

机构信息

Agricultural Science and Engineering School, Liaocheng University, Liaocheng, 252059, China.

School of Statistics and Data Science, Shanghai University of Finance and Economics, Shanghai, 200433, China.

出版信息

Sci Rep. 2025 Aug 31;15(1):32031. doi: 10.1038/s41598-025-17513-0.

Abstract

The secondary structure of a protein serves as the foundation for constructing its three-dimensional (3D) structure, which in turn is critical for determining its function and role in biological processes. Therefore, accurately predicting secondary structure not only facilitates the understanding of a protein's 3D conformation but also provides essential insights into its interactions, functional mechanisms, and potential applications in biomedical research. Deep learning models are particularly effective in protein secondary structure prediction because of their ability to process complex sequence data and extract meaningful patterns, thereby increasing prediction accuracy and efficiency. This study proposes a combined model, ITBM-KD, which integrates an improved temporal convolutional network (TCN), bidirectional recurrent neural network (BiRNN), and multilayer perceptron (MLP) to increase the accuracy of protein secondary structure prediction for octapeptides and tripeptides. By combining one-hot encoding, word vector representation of physicochemical properties, and knowledge distillation with the ProtT5 model, the proposed model achieves excellent performance on multiple datasets. To evaluate its effectiveness, two classic datasets, TS115 and CB513, containing 115 and 513 protein datasets, respectively, were used. In addition, 15,078 protein data points collected from the PDB database from June 6, 2018, to June 6, 2020, were used to further verify the robustness and generalizability of the model. This study improves prediction accuracy and provides an essential model for understanding protein structure and function, especially in resource-limited settings.

摘要

蛋白质的二级结构是构建其三维(3D)结构的基础,而三维结构对于确定其在生物过程中的功能和作用至关重要。因此,准确预测二级结构不仅有助于理解蛋白质的3D构象,还能为其相互作用、功能机制以及在生物医学研究中的潜在应用提供重要见解。深度学习模型在蛋白质二级结构预测中特别有效,因为它们能够处理复杂的序列数据并提取有意义的模式,从而提高预测的准确性和效率。本研究提出了一种组合模型ITBM-KD,该模型集成了改进的时间卷积网络(TCN)、双向循环神经网络(BiRNN)和多层感知器(MLP),以提高八肽和三肽蛋白质二级结构预测的准确性。通过将独热编码、物理化学性质的词向量表示以及知识蒸馏与ProtT5模型相结合,所提出的模型在多个数据集上取得了优异的性能。为了评估其有效性,使用了两个经典数据集TS115和CB513,分别包含115个和513个蛋白质数据集。此外,还使用了从2018年6月6日至2020年6月6日从PDB数据库收集的15078个蛋白质数据点,以进一步验证该模型的稳健性和通用性。本研究提高了预测准确性,并为理解蛋白质结构和功能提供了一个重要模型,特别是在资源有限的环境中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验