• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ASPIRER:一种基于深度学习的新计算方法,用于识别非经典分泌蛋白。

ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning.

机构信息

Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.

Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria, Australia.

出版信息

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac031.

DOI:10.1093/bib/bbac031
PMID:35176756
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8921646/
Abstract

Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.

摘要

蛋白质分泌在许多生物过程中起着关键作用,特别是对于细胞间通讯,从细胞质到宿主或外部环境。革兰氏阳性菌可以通过多种分泌途径分泌蛋白质。在这些分泌途径中,非经典分泌途径最近受到越来越多的关注,但确切的机制仍不清楚。非经典分泌蛋白(NCSP)是一类缺乏信号肽和基序的分泌蛋白。已经提出了几种 NCSP 预测器来识别 NCSP,其中大多数使用 NCSP 的整个氨基酸序列来构建模型。然而,不同蛋白质的序列长度差异很大。此外,蛋白质的所有区域并不都同等重要,有些局部区域与分泌无关。蛋白质的功能区域,特别是 N 端和 C 端区域,包含与分泌有关的重要决定因素。在这项研究中,我们提出了一种新的基于深度学习的混合框架,称为 ASPIRER,用于从氨基酸序列中改进 NCSP 的预测。更具体地说,它结合了基于整个序列的 XGBoost 模型和基于 N 端序列的卷积神经网络模型;5 折交叉验证和独立测试表明,ASPIRER 比现有的最先进方法具有更好的性能。ASPIRER 的源代码和精选数据集可在 https://github.com/yanwu20/ASPIRER/ 上公开获取。ASPIRER 有望成为一种有用的工具,用于从序列信息中提高对新型潜在 NCSP 的预测,并对候选蛋白质进行后续实验验证的优先级排序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/69d921fbf783/bbac031f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/d34ced765285/bbac031f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/39ad77a476b4/bbac031f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/019225e9a1c9/bbac031f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/44d93e62a52a/bbac031f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/69d921fbf783/bbac031f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/d34ced765285/bbac031f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/39ad77a476b4/bbac031f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/019225e9a1c9/bbac031f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/44d93e62a52a/bbac031f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/18ca/8921646/69d921fbf783/bbac031f5.jpg

相似文献

1
ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning.ASPIRER:一种基于深度学习的新计算方法,用于识别非经典分泌蛋白。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac031.
2
NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning.NCSP-PLM:基于蛋白质语言模型和深度学习的非经典分泌蛋白预测的集成学习框架。
Math Biosci Eng. 2024 Jan;21(1):1472-1488. doi: 10.3934/mbe.2024063. Epub 2022 Dec 28.
3
PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.PeNGaRoo,一种组合梯度提升和集成学习框架,用于预测非经典分泌蛋白。
Bioinformatics. 2020 Feb 1;36(3):704-712. doi: 10.1093/bioinformatics/btz629.
4
Model fusion for predicting unconventional proteins secreted by exosomes using deep learning.使用深度学习进行模型融合以预测外泌体分泌的非常规蛋白质
Proteomics. 2024 Sep;24(17):e2300184. doi: 10.1002/pmic.202300184. Epub 2024 Apr 21.
5
DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence.DeepT3:使用 N 端序列,深度卷积神经网络准确识别革兰氏阴性菌 III 型分泌效应物。
Bioinformatics. 2019 Jun 1;35(12):2051-2057. doi: 10.1093/bioinformatics/bty931.
6
DP-site: A dual deep learning-based method for protein-peptide interaction site prediction.DP-site:一种基于双重深度学习的蛋白质-肽相互作用位点预测方法。
Methods. 2024 Sep;229:17-29. doi: 10.1016/j.ymeth.2024.06.001. Epub 2024 Jun 12.
7
Convolutional neural networks with image representation of amino acid sequences for protein function prediction.基于氨基酸序列图像表示的卷积神经网络用于蛋白质功能预测。
Comput Biol Chem. 2021 Jun;92:107494. doi: 10.1016/j.compbiolchem.2021.107494. Epub 2021 Apr 24.
8
NonClasGP-Pred: robust and efficient prediction of non-classically secreted proteins by integrating subset-specific optimal models of imbalanced data.非经典分泌蛋白预测:通过整合不平衡数据子集特定最优模型实现稳健高效预测
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000483. Epub 2020 Nov 27.
9
TEC-miTarget: enhancing microRNA target prediction based on deep learning of ribonucleic acid sequences.TEC-miTarget:基于 RNA 序列深度学习的 miRNA 靶基因预测增强方法。
BMC Bioinformatics. 2024 Apr 20;25(1):159. doi: 10.1186/s12859-024-05780-z.
10
mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations.mACPpred 2.0:具有集成空间和概率特征表示的用于抗癌肽预测的堆叠深度学习。
J Mol Biol. 2024 Sep 1;436(17):168687. doi: 10.1016/j.jmb.2024.168687. Epub 2024 Jun 25.

引用本文的文献

1
iNClassSec-ESM: Discovering potential non-classical secreted proteins through a novel protein language model.iNClassSec-ESM:通过一种新型蛋白质语言模型发现潜在的非经典分泌蛋白。
Comput Struct Biotechnol J. 2025 Mar 28;27:1350-1358. doi: 10.1016/j.csbj.2025.03.043. eCollection 2025.
2
HPClas: A data-driven approach for identifying halophilic proteins based on catBoost.HPClas:一种基于CatBoost的数据驱动型嗜盐蛋白识别方法。
mLife. 2024 Jul 20;3(4):515-526. doi: 10.1002/mlf2.12125. eCollection 2024 Dec.
3
Leucine-rich repeat proteins of that interact to host glycosaminoglycans and integrins.

本文引用的文献

1
Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.海豚:一种准确预测 RNA 假尿嘧啶位点的新方法。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab245.
2
iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.iLearnPlus:一个全面的、自动化的机器学习平台,用于核酸和蛋白质序列分析、预测和可视化。
Nucleic Acids Res. 2021 Jun 4;49(10):e60. doi: 10.1093/nar/gkab122.
3
NonClasGP-Pred: robust and efficient prediction of non-classically secreted proteins by integrating subset-specific optimal models of imbalanced data.
与宿主糖胺聚糖和整合素相互作用的富含亮氨酸的重复蛋白。
Front Microbiol. 2024 Nov 26;15:1497712. doi: 10.3389/fmicb.2024.1497712. eCollection 2024.
4
HemoFuse: multi-feature fusion based on multi-head cross-attention for identification of hemolytic peptides.HemoFuse:基于多头交叉注意力的多特征融合用于识别溶血肽。
Sci Rep. 2024 Sep 28;14(1):22518. doi: 10.1038/s41598-024-74326-3.
5
Molecular Characterization and Functional Analysis of a Serine Protease Inhibitor, Smserpin-p46.一种丝氨酸蛋白酶抑制剂Smserpin-p46的分子特征及功能分析
Microorganisms. 2024 Jun 7;12(6):1164. doi: 10.3390/microorganisms12061164.
6
MERITS: a web-based integrated PE/PPE protein database.优点:一个基于网络的整合型PE/PPE蛋白数据库。
Bioinform Adv. 2024 Mar 2;4(1):vbae035. doi: 10.1093/bioadv/vbae035. eCollection 2024.
7
Artificial intelligence-driven systems engineering for next-generation plant-derived biopharmaceuticals.用于下一代植物源生物制药的人工智能驱动的系统工程。
Front Plant Sci. 2023 Nov 15;14:1252166. doi: 10.3389/fpls.2023.1252166. eCollection 2023.
8
A voting-based machine learning approach for classifying biological and clinical datasets.基于投票的机器学习方法在生物和临床数据集分类中的应用。
BMC Bioinformatics. 2023 Apr 11;24(1):140. doi: 10.1186/s12859-023-05274-4.
9
Clarion is a multi-label problem transformation method for identifying mRNA subcellular localizations.Clarion 是一种多标签问题转换方法,用于识别 mRNA 亚细胞定位。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac467.
10
PreAcrs: a machine learning framework for identifying anti-CRISPR proteins.预 Acrs:一种用于识别抗 CRISPR 蛋白的机器学习框架。
BMC Bioinformatics. 2022 Oct 25;23(1):444. doi: 10.1186/s12859-022-04986-3.
非经典分泌蛋白预测:通过整合不平衡数据子集特定最优模型实现稳健高效预测
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000483. Epub 2020 Nov 27.
4
Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks.基于级联深度胶囊神经网络的真核启动子计算识别。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa299.
5
DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.DeepTorrent:一种基于深度学习的方法,用于预测 DNA N4-甲基胞嘧啶位点。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa124.
6
Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework.利用堆叠集成学习框架对大肠杆菌中的一般和特定类型启动子进行计算预测和解释。
Brief Bioinform. 2021 Mar 22;22(2):2126-2140. doi: 10.1093/bib/bbaa049.
7
Principle and potential applications of the non-classical protein secretory pathway in bacteria.细菌中非经典蛋白分泌途径的原理及潜在应用。
Appl Microbiol Biotechnol. 2020 Feb;104(3):953-965. doi: 10.1007/s00253-019-10285-4. Epub 2019 Dec 18.
8
Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.全面综述和评估基于 RNA 序列预测 RNA 转录后修饰位点的计算方法。
Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112.
9
DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites.DeepCleave:用于半胱天冬酶和基质金属蛋白酶底物及切割位点的深度学习预测器。
Bioinformatics. 2020 Feb 15;36(4):1057-1065. doi: 10.1093/bioinformatics/btz721.
10
PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.PeNGaRoo,一种组合梯度提升和集成学习框架,用于预测非经典分泌蛋白。
Bioinformatics. 2020 Feb 1;36(3):704-712. doi: 10.1093/bioinformatics/btz629.