BERT-PPII：基于 BERT 和多通道 CNN 的聚脯氨酸 II 型螺旋结构预测模型。

BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.

机构信息

School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China.

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.

出版信息

Biomed Res Int. 2022 Aug 24;2022:9015123. doi: 10.1155/2022/9015123. eCollection 2022.

DOI:10.1155/2022/9015123

PMID:36060139

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9433275/

Abstract

Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way, which causes the insufficient learning of protein sequence feature information. To improve the protein sequence encoding performance, this paper proposes a BERT-based PPII helix structure prediction algorithm (BERT-PPII), which learns the protein sequence information based on the BERT model. The BERT model's vector can fairly fuse sample's each amino acid residue information. Thus, we utilize the vector as the global feature to represent the sample's global contextual information. As the interactions among the protein chains' local amino acid residues have an important influence on the formation of PPII helix, we utilize the CNN to extract local amino acid residues' features which can further enhance the information expression of protein sequence samples. In this paper, we fuse the vectors with CNN local features to improve the performance of predicting PPII structure. Compared to the state-of-the-art PPIIPRED method, the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the strict dataset and 2% on the less strict dataset. Correspondingly, the results on the balanced dataset show that the AUCs of the proposed method are 0.826 on the strict dataset and 0.785 on less strict datasets, respectively. For the independent test set, the proposed method has the AUC value of 0.827 on the strict dataset and 0.783 on the less strict dataset. The above experimental results have proved that the proposed BERT-PPII method can achieve a superior performance of predicting the PPII helix.

摘要

预测聚脯氨酸 II 型 (PPII) 螺旋结构在许多研究领域都至关重要，如蛋白质折叠机制、药物靶点和蛋白质功能。然而，许多现有的 PPII 螺旋预测算法以单一方式编码蛋白质序列信息，导致对蛋白质序列特征信息的学习不足。为了提高蛋白质序列编码性能，本文提出了一种基于 BERT 的 PPII 螺旋结构预测算法 (BERT-PPII)，它基于 BERT 模型学习蛋白质序列信息。BERT 模型的向量可以公平地融合样本中每个氨基酸残基的信息。因此，我们利用向量作为全局特征来表示样本的全局上下文信息。由于蛋白质链局部氨基酸残基之间的相互作用对 PPII 螺旋的形成有重要影响，我们利用 CNN 提取局部氨基酸残基的特征，进一步增强蛋白质序列样本的信息表达。在本文中，我们融合向量与 CNN 局部特征，以提高预测 PPII 结构的性能。与最先进的 PPIIPRED 方法相比，在不平衡数据集上的实验结果表明，所提出的方法在严格数据集上的准确率提高了 1%，在较不严格数据集上提高了 2%。相应地，在平衡数据集上的结果表明，所提出的方法在严格数据集上的 AUC 值分别为 0.826 和 0.785，在较不严格数据集上的 AUC 值分别为 0.785 和 0.783。对于独立测试集，所提出的方法在严格数据集上的 AUC 值为 0.827，在较不严格数据集上的 AUC 值为 0.783。上述实验结果证明了所提出的 BERT-PPII 方法在预测 PPII 螺旋方面具有优越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06c5/9433275/654eb45893ce/BMRI2022-9015123.001.jpg

相似文献

BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.BERT-PPII：基于 BERT 和多通道 CNN 的聚脯氨酸 II 型螺旋结构预测模型。

Biomed Res Int. 2022 Aug 24;2022:9015123. doi: 10.1155/2022/9015123. eCollection 2022.

AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model.AAindex-PPII：基于氨基酸指标，利用改进的 BiGRU-TextCNN 模型预测聚脯氨酸 II 型螺旋结构。

J Bioinform Comput Biol. 2023 Oct;21(5):2350022. doi: 10.1142/S0219720023500221. Epub 2023 Oct 28.

PolyprOnline: polyproline helix II and secondary structure assignment database.PolyprOnline：聚脯氨酸螺旋II与二级结构归属数据库。

Database (Oxford). 2014 Nov 7;2014. doi: 10.1093/database/bau102. Print 2014.

Conservation of polyproline II helices in homologous proteins: implications for structure prediction by model building.同源蛋白质中多聚脯氨酸II螺旋的保守性：对基于模型构建的结构预测的影响。

Protein Sci. 1994 Dec;3(12):2395-410. doi: 10.1002/pro.5560031223.

Left-handed polyproline II helix formation is (very) locally driven.左手多聚脯氨酸II螺旋的形成是（非常）局部驱动的。

Proteins. 1998 Nov 1;33(2):218-26.

Recent advances on polyproline II.聚脯氨酸II的最新进展。

Amino Acids. 2017 Apr;49(4):705-713. doi: 10.1007/s00726-017-2385-6. Epub 2017 Feb 9.

Structural and functional analyses of PolyProline-II helices in globular proteins.球状蛋白质中多聚脯氨酸-II螺旋的结构与功能分析。

J Struct Biol. 2016 Dec;196(3):414-425. doi: 10.1016/j.jsb.2016.09.006. Epub 2016 Sep 13.

Prediction of polyproline II secondary structure propensity in proteins.蛋白质中多聚脯氨酸II二级结构倾向的预测。

R Soc Open Sci. 2020 Jan 15;7(1):191239. doi: 10.1098/rsos.191239. eCollection 2020 Jan.

A survey of left-handed polyproline II helices.左旋多聚脯氨酸II螺旋的一项调查。

Protein Sci. 1999 Mar;8(3):587-95. doi: 10.1110/ps.8.3.587.

A propensity scale for type II polyproline helices (PPII): aromatic amino acids in proline-rich sequences strongly disfavor PPII due to proline-aromatic interactions.II 型聚脯氨酸螺旋倾向尺度：富含脯氨酸的序列中的芳香族氨基酸由于脯氨酸-芳香族相互作用而强烈不利于 PPII。

Biochemistry. 2012 Jun 26;51(25):5041-51. doi: 10.1021/bi3002924. Epub 2012 Jun 14.

引用本文的文献

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景：任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

本文引用的文献

Adapt-Kcr: a novel deep learning framework for accurate prediction of lysine crotonylation sites based on learning embedding features and attention architecture.Adapt-Kcr：一种基于学习嵌入特征和注意力架构的新型深度学习框架，用于准确预测赖氨酸巴豆酰化位点。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac037.

BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information.BERT-m7G：一种基于 BERT 和堆叠集成的转换器架构，用于从序列信息中识别 RNA N7-甲基鸟苷位点。

Comput Math Methods Med. 2021 Aug 25;2021:7764764. doi: 10.1155/2021/7764764. eCollection 2021.

Do polyproline II helix associations modulate biomolecular condensates?多聚脯氨酸 II 螺旋缔合是否调节生物分子凝聚物？

FEBS Open Bio. 2021 Sep;11(9):2390-2399. doi: 10.1002/2211-5463.13163. Epub 2021 May 2.

The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言：自然语言处理、机器学习与蛋白质序列

Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.

BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides.BERT4Bitter：一种基于变换器双向编码器表征（BERT）的模型，用于改进苦味肽的预测。

Bioinformatics. 2021 Sep 9;37(17):2556-2562. doi: 10.1093/bioinformatics/btab133.

FAD-BERT: Improved prediction of FAD binding sites using pre-training of deep bidirectional transformers.FAD-BERT：使用深度双向转换器的预训练改进 FAD 结合位点预测。

Comput Biol Med. 2021 Apr;131:104258. doi: 10.1016/j.compbiomed.2021.104258. Epub 2021 Feb 8.

GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.GT-Finder：使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。

Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.

A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.基于 BERT 和二维卷积神经网络的变压器架构，用于从序列信息中识别 DNA 增强子。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.

SNB-PSSM: A spatial neighbor-based PSSM used for protein-RNA binding site prediction.SNB-PSSM：一种基于空间邻居的 PSSM，用于蛋白质-RNA 结合位点预测。

J Mol Recognit. 2021 Jun;34(6):e2887. doi: 10.1002/jmr.2887. Epub 2021 Jan 14.

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.DNA 序列通过利用深度学习算法进行自然语言处理，用于识别 N4-甲基胞嘧啶。

Sci Rep. 2021 Jan 8;11(1):212. doi: 10.1038/s41598-020-80430-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

BERT-PPII：基于 BERT 和多通道 CNN 的聚脯氨酸 II 型螺旋结构预测模型。

BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献