通过C端和N端的联合特征对细菌IV型分泌效应蛋白进行有效预测。

Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.

作者信息

Wang Yu, Guo Yanzhi, Pu Xuemei, Li Menglong

机构信息

College of Chemistry, Sichuan University, Chengdu, 610064, China.

College of Materials and Chemistry & Chemical Engineering, Chengdu University of Technology, Chengdu, 610059, China.

出版信息

J Comput Aided Mol Des. 2017 Nov;31(11):1029-1038. doi: 10.1007/s10822-017-0080-z. Epub 2017 Nov 10.

DOI:10.1007/s10822-017-0080-z

PMID:29127583

Abstract

Various bacterial pathogens can deliver their secreted substrates also called as effectors through type IV secretion systems (T4SSs) into host cells and cause diseases. Since T4SS secreted effectors (T4SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T4SSs. A few computational methods using machine learning algorithms for T4SEs prediction have been developed by using features of C-terminal residues. However, recent studies have shown that targeting information can also be encoded in the N-terminal region of at least some T4SEs. In this study, we present an effective method for T4SEs prediction by novelly integrating both N-terminal and C-terminal sequence information. First, we collected a comprehensive dataset across multiple bacterial species of known T4SEs and non-T4SEs from literatures. Then, three types of distinctive features, namely amino acid composition, composition, transition and distribution and position-specific scoring matrices were calculated for 50 N-terminal and 100 C-terminal residues. After that, we employed information gain represent to rank the importance score of the 150 different position residues for T4SE secretion signaling. At last, 125 distinctive position residues were singled out for the prediction model to classify T4SEs and non-T4SEs. The support vector machine model yields a high receiver operating curve of 0.916 in the fivefold cross-validation and an accuracy of 85.29% for the independent test set.

摘要

多种细菌病原体可通过IV型分泌系统（T4SSs）将其分泌的底物（也称为效应蛋白）传递到宿主细胞中并引发疾病。由于T4SS分泌的效应蛋白（T4SEs）在病原体与宿主的相互作用中发挥着重要作用，因此识别它们对于我们理解T4SSs的致病机制至关重要。已经开发了一些使用机器学习算法基于C末端残基特征来预测T4SEs的计算方法。然而，最近的研究表明，靶向信息也可以编码在至少一些T4SEs的N末端区域。在本研究中，我们通过创新性地整合N末端和C末端序列信息，提出了一种有效的T4SEs预测方法。首先，我们从文献中收集了一个涵盖多种细菌物种的已知T4SEs和非T4SEs的综合数据集。然后，针对50个N末端和100个C末端残基计算了三种不同类型的特征，即氨基酸组成、组成、转换和分布以及位置特异性评分矩阵。之后，我们使用信息增益来对150个不同位置残基对于T4SE分泌信号的重要性得分进行排名。最后，为预测模型挑选出125个独特的位置残基以区分T4SEs和非T4SEs。支持向量机模型在五折交叉验证中产生了0.916的高受试者工作曲线，在独立测试集中的准确率为85.29%。

相似文献

Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.通过C端和N端的联合特征对细菌IV型分泌效应蛋白进行有效预测。

J Comput Aided Mol Des. 2017 Nov;31(11):1029-1038. doi: 10.1007/s10822-017-0080-z. Epub 2017 Nov 10.

Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches.基于机器学习方法的 IV 型分泌效应蛋白的系统分析和预测。

Brief Bioinform. 2019 May 21;20(3):931-951. doi: 10.1093/bib/bbx164.

PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method.PredT4SE-Stack：使用堆叠集成方法从蛋白质序列预测细菌IV型分泌效应蛋白

Front Microbiol. 2018 Oct 26;9:2571. doi: 10.3389/fmicb.2018.02571. eCollection 2018.

T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors.T4SEpp：一种整合蛋白质语言模型以预测细菌IV型分泌效应蛋白的流程。

Comput Struct Biotechnol J. 2024 Jan 23;23:801-812. doi: 10.1016/j.csbj.2024.01.015. eCollection 2024 Dec.

Computational prediction of secretion systems and secretomes of Brucella: identification of novel type IV effectors and their interaction with the host.布鲁氏菌分泌系统和分泌蛋白组的计算预测：新型IV型效应蛋白的鉴定及其与宿主的相互作用

Mol Biosyst. 2016 Jan;12(1):178-90. doi: 10.1039/c5mb00607d. Epub 2015 Nov 17.

iT4SE-EP: Accurate Identification of Bacterial Type IV Secreted Effectors by Exploring Evolutionary Features from Two PSI-BLAST Profiles.iT4SE-EP：通过探索来自两个PSI-BLAST图谱的进化特征准确鉴定细菌IV型分泌效应蛋白

Molecules. 2021 Apr 24;26(9):2487. doi: 10.3390/molecules26092487.

Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI.全面评估和性能改进的效应蛋白预测因子的细菌分泌系统 III、IV 和 VI。

Brief Bioinform. 2018 Jan 1;19(1):148-161. doi: 10.1093/bib/bbw100.

Prediction of bacterial type IV secreted effectors by C-terminal features.基于 C 端特征预测细菌 IV 型分泌效应子。

BMC Genomics. 2014 Jan 21;15:50. doi: 10.1186/1471-2164-15-50.

T4SEfinder: a bioinformatics tool for genome-scale prediction of bacterial type IV secreted effectors using pre-trained protein language model.T4SEfinder：一种使用预先训练的蛋白质语言模型进行基于基因组规模预测细菌 IV 型分泌效应子的生物信息学工具。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab420.

Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles.使用氨基酸组成和 PSSM 特征预测细菌 IV 型分泌效应子的准确性。

Bioinformatics. 2013 Dec 15;29(24):3135-42. doi: 10.1093/bioinformatics/btt554. Epub 2013 Sep 23.

引用本文的文献

DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria.DeepSecE：一种基于深度学习的革兰氏阴性菌分泌蛋白多类预测框架。

Research (Wash D C). 2023 Oct 25;6:0258. doi: 10.34133/research.0258. eCollection 2023.

Microbial Effectors: Key Determinants in Plant Health and Disease.微生物效应子：植物健康与疾病的关键决定因素

Microorganisms. 2022 Oct 6;10(10):1980. doi: 10.3390/microorganisms10101980.

Protein-Specific Prediction of RNA-Binding Sites Based on Information Entropy.基于信息熵的蛋白质特异性 RNA 结合位点预测。

Comput Intell Neurosci. 2022 Oct 3;2022:8626628. doi: 10.1155/2022/8626628. eCollection 2022.

Molecules. 2021 Apr 24;26(9):2487. doi: 10.3390/molecules26092487.

Computational prediction of secreted proteins in gram-negative bacteria.革兰氏阴性菌中分泌蛋白的计算预测。

Comput Struct Biotechnol J. 2021 Mar 22;19:1806-1828. doi: 10.1016/j.csbj.2021.03.019. eCollection 2021.

DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors.DeepT3_4：一种用于区分细菌III型和IV型分泌效应蛋白的混合深度神经网络模型

Front Microbiol. 2021 Jan 21;12:605782. doi: 10.3389/fmicb.2021.605782. eCollection 2021.

Variable selection from a feature representing protein sequences: a case of classification on bacterial type IV secreted effectors.基于蛋白质序列特征的变量选择：以 IV 型细菌分泌效应子分类为例。

BMC Bioinformatics. 2020 Oct 27;21(1):480. doi: 10.1186/s12859-020-03826-6.

T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm.T4SE-XGB：使用极端梯度提升算法对IV型分泌效应蛋白进行基于序列的可解释预测。

Front Microbiol. 2020 Sep 24;11:580382. doi: 10.3389/fmicb.2020.580382. eCollection 2020.

Individually double minimum-distance definition of protein-RNA binding residues and application to structure-based prediction.个体双最小距离定义蛋白质 RNA 结合残基及其在基于结构预测中的应用。

J Comput Aided Mol Des. 2018 Dec;32(12):1363-1373. doi: 10.1007/s10822-018-0177-z. Epub 2018 Nov 26.

Front Microbiol. 2018 Oct 26;9:2571. doi: 10.3389/fmicb.2018.02571. eCollection 2018.

本文引用的文献

SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems.SecretEPDB：一个全面的基于网络的细菌 III、IV 和 VI 型分泌系统分泌效应蛋白资源。

Sci Rep. 2017 Jan 23;7:41031. doi: 10.1038/srep41031.

Brief Bioinform. 2018 Jan 1;19(1):148-161. doi: 10.1093/bib/bbw100.

Protein folding in the cell envelope of Escherichia coli.大肠杆菌细胞包膜中的蛋白质折叠。

Nat Microbiol. 2016 Jul 26;1(8):16107. doi: 10.1038/nmicrobiol.2016.107.

Type IV secretion system of Brucella spp. and its effectors.布鲁氏菌属的IV型分泌系统及其效应蛋白。

Front Cell Infect Microbiol. 2015 Oct 13;5:72. doi: 10.3389/fcimb.2015.00072. eCollection 2015.

Prediction of membrane transport proteins and their substrate specificities using primary sequence information.利用一级序列信息预测膜转运蛋白及其底物特异性。

PLoS One. 2014 Jun 26;9(6):e100278. doi: 10.1371/journal.pone.0100278. eCollection 2014.

SPiCE: a web-based tool for sequence-based protein classification and exploration.SPiCE：一个基于网络的基于序列的蛋白质分类和探索工具。

BMC Bioinformatics. 2014 Mar 31;15:93. doi: 10.1186/1471-2105-15-93.

A protein structural classes prediction method based on PSI-BLAST profile.一种基于PSI-BLAST序列谱的蛋白质结构类预测方法。

J Theor Biol. 2014 Jul 21;353:19-23. doi: 10.1016/j.jtbi.2014.02.034. Epub 2014 Mar 4.

Prediction of bacterial type IV secreted effectors by C-terminal features.基于 C 端特征预测细菌 IV 型分泌效应子。

BMC Genomics. 2014 Jan 21;15:50. doi: 10.1186/1471-2164-15-50.

pLogo: a probabilistic approach to visualizing sequence motifs.pLogo：一种可视化序列基序的概率方法。

Nat Methods. 2013 Dec;10(12):1211-2. doi: 10.1038/nmeth.2646. Epub 2013 Oct 6.

Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles.使用氨基酸组成和 PSSM 特征预测细菌 IV 型分泌效应子的准确性。

Bioinformatics. 2013 Dec 15;29(24):3135-42. doi: 10.1093/bioinformatics/btt554. Epub 2013 Sep 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过C端和N端的联合特征对细菌IV型分泌效应蛋白进行有效预测。

Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献