将迁移学习技术与氨基酸嵌入相结合，以有效预测离子通道中的 N-连接糖基化位点。

Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels.

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan.

Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 106, Taiwan.

出版信息

Comput Biol Med. 2021 Mar;130:104212. doi: 10.1016/j.compbiomed.2021.104212. Epub 2021 Jan 7.

DOI:10.1016/j.compbiomed.2021.104212

PMID:33454535

Abstract

Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins. From segments of ion channel proteins centered around N-linked glycosylation sites, the amino acid embedding vectors of each residue were concatenated to create features for prediction. We experimented with two different models for converting amino acids to their corresponding embeddings: one was fed with ion channel sequences and the other with a large dataset composed of more than one million protein sequences. The latter model stemmed from the idea of transfer learning technique and emerged as a more efficient feature extractor. Our best model was obtained from this transfer learning approach and a hyperparameter tuning process with a random search on 5-fold cross-validation data. It achieved an accuracy, specificity, sensitivity, and Matthews correlation coefficient of 93.4%, 92.8%, 98.6%, and 0.726, respectively. Corresponding scores on an independent test were 92.9%, 92.2%, 99%, and 0.717. These results outperform the position-specific scoring matrix features that are predominantly employed in post-translational modification site predictions. Furthermore, compared to N-GlyDE, GlycoEP, SPRINT-Gly, the most recent N-linked glycosylation site predictors, our model yields higher scores on the above 4 metrics, thus further demonstrating the efficiency of our approach.

摘要

糖基化是一种将聚糖附着到蛋白质或其他有机分子（如脂蛋白）上的动态酶促过程。研究表明，这种过程在离子通道蛋白中起着调节离子通道功能的基本作用。本研究使用计算方法预测离子通道蛋白中最常见的 N 连接糖基化位点。从围绕 N 连接糖基化位点的离子通道蛋白片段中，将每个残基的氨基酸嵌入向量连接起来，为预测创建特征。我们尝试了两种将氨基酸转换为相应嵌入的不同模型：一种是用离子通道序列输入，另一种是用由一百多万个蛋白质序列组成的大型数据集输入。后一种模型源于迁移学习技术的思想，是一种更有效的特征提取器。我们最好的模型是从这种迁移学习方法和在 5 倍交叉验证数据上进行随机搜索的超参数调优过程中获得的。它在准确性、特异性、敏感性和 Matthews 相关系数方面的得分为 93.4%、92.8%、98.6%和 0.726，独立测试的对应分数分别为 92.9%、92.2%、99%和 0.717。这些结果优于主要用于翻译后修饰位点预测的位置特异性评分矩阵特征。此外，与 N-GlyDE、GlycoEP、SPRINT-Gly 等最新的 N 连接糖基化位点预测器相比，我们的模型在上述 4 项指标上的得分更高，进一步证明了我们方法的效率。

相似文献

Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels.将迁移学习技术与氨基酸嵌入相结合，以有效预测离子通道中的 N-连接糖基化位点。

Comput Biol Med. 2021 Mar;130:104212. doi: 10.1016/j.compbiomed.2021.104212. Epub 2021 Jan 7.

Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins.计算预测人类和小鼠蛋白质的 N-和 O-连接糖基化位点。

Methods Mol Biol. 2022;2499:177-186. doi: 10.1007/978-1-0716-2317-6_9.

LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model.LMNglyPred：使用预先训练的蛋白质语言模型的嵌入来预测人类 N-连接糖基化位点。

Glycobiology. 2023 Jun 3;33(5):411-422. doi: 10.1093/glycob/cwad033.

iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features.iDPGK：基于序列特征的赖氨酸磷酸甘油化位点的表征和鉴定。

BMC Bioinformatics. 2020 Dec 9;21(1):568. doi: 10.1186/s12859-020-03916-5.

SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties.SPRINT-Gly：利用序列和预测的结构特性预测人和鼠蛋白的 N-和 O-糖基化位点。

Bioinformatics. 2019 Oct 15;35(20):4140-4146. doi: 10.1093/bioinformatics/btz215.

Prediction of N-linked glycosylation sites using position relative features and statistical moments.利用位置相关特征和统计矩预测N-糖基化位点

PLoS One. 2017 Aug 10;12(8):e0181966. doi: 10.1371/journal.pone.0181966. eCollection 2017.

Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection.基于氨基酸多尺度组成和特征选择的O-糖基化位点预测

Med Biol Eng Comput. 2015 Jun;53(6):535-44. doi: 10.1007/s11517-015-1268-9. Epub 2015 Mar 10.

UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.UbiSite：结合具有底物基序的两层机器学习方法来预测赖氨酸上的泛素结合位点。

BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z.

EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction.EMNGly：使用特征提取的语言模型预测 N-连接糖基化位点。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad650.

GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome.糖基分析软件（GlycoMine）：一种基于机器学习的方法，用于预测人类蛋白质组中的 N-、C-和 O-糖基化。

Bioinformatics. 2015 May 1;31(9):1411-9. doi: 10.1093/bioinformatics/btu852. Epub 2015 Jan 6.

引用本文的文献

Ion channel trafficking implications in heart failure.离子通道转运在心力衰竭中的意义。

Front Cardiovasc Med. 2024 Feb 14;11:1351496. doi: 10.3389/fcvm.2024.1351496. eCollection 2024.

An analytical study on the identification of N-linked glycosylation sites using machine learning model.基于机器学习模型的N-糖基化位点识别分析研究

PeerJ Comput Sci. 2022 Sep 21;8:e1069. doi: 10.7717/peerj-cs.1069. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将迁移学习技术与氨基酸嵌入相结合，以有效预测离子通道中的 N-连接糖基化位点。

Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献