• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

T4SE-XGB:使用极端梯度提升算法对IV型分泌效应蛋白进行基于序列的可解释预测。

T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm.

作者信息

Chen Tianhang, Wang Xiangeng, Chu Yanyi, Wang Yanjing, Jiang Mingming, Wei Dong-Qing, Xiong Yi

机构信息

State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.

Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.

出版信息

Front Microbiol. 2020 Sep 24;11:580382. doi: 10.3389/fmicb.2020.580382. eCollection 2020.

DOI:10.3389/fmicb.2020.580382
PMID:33072049
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7541839/
Abstract

Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.

摘要

IV型分泌效应蛋白(T4SEs)可通过IV型分泌系统(T4SS)转运至宿主细胞胞质溶胶中并引发疾病。然而,鉴定T4SEs的实验方法既耗时又耗费资源,并且现有的基于机器学习技术的计算工具存在一些明显的局限性,例如预测模型缺乏可解释性。在本研究中,我们提出了一种新模型T4SE-XGB,该模型基于蛋白质序列的最优特征,使用极端梯度提升(XGBoost)算法来准确鉴定IV型效应蛋白。在尝试了20种不同类型的特征后,与其他机器学习方法相比,通过五折交叉验证将所有特征输入XGBoost时,获得了最佳性能。然后,采用ReliefF算法在我们的数据集上获得最优特征集,进一步提高了模型性能。T4SE-XGB在独立测试集上表现出最高的预测性能,优于其他已发表的预测工具。此外,使用SHAP方法来解释特征对模型预测的贡献。关键特征的鉴定有助于更好地理解宿主-病原体相互作用和细菌致病机制的多因素贡献者。除了IV型效应蛋白预测外,我们相信所提出的框架可为类似研究构建相关生物学问题的预测方法提供指导性的指导。本研究的数据和源代码可在https://github.com/CT001002/T4SE-XGB上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/4b628de33aca/fmicb-11-580382-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/dcf21f4d5593/fmicb-11-580382-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/f76edddb875f/fmicb-11-580382-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/4b628de33aca/fmicb-11-580382-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/dcf21f4d5593/fmicb-11-580382-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/f76edddb875f/fmicb-11-580382-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/748b/7541839/4b628de33aca/fmicb-11-580382-g0003.jpg

相似文献

1
T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm.T4SE-XGB:使用极端梯度提升算法对IV型分泌效应蛋白进行基于序列的可解释预测。
Front Microbiol. 2020 Sep 24;11:580382. doi: 10.3389/fmicb.2020.580382. eCollection 2020.
2
Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches.基于机器学习方法的 IV 型分泌效应蛋白的系统分析和预测。
Brief Bioinform. 2019 May 21;20(3):931-951. doi: 10.1093/bib/bbx164.
3
Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.通过C端和N端的联合特征对细菌IV型分泌效应蛋白进行有效预测。
J Comput Aided Mol Des. 2017 Nov;31(11):1029-1038. doi: 10.1007/s10822-017-0080-z. Epub 2017 Nov 10.
4
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors.T4SEpp:一种整合蛋白质语言模型以预测细菌IV型分泌效应蛋白的流程。
Comput Struct Biotechnol J. 2024 Jan 23;23:801-812. doi: 10.1016/j.csbj.2024.01.015. eCollection 2024 Dec.
5
T4SEfinder: a bioinformatics tool for genome-scale prediction of bacterial type IV secreted effectors using pre-trained protein language model.T4SEfinder:一种使用预先训练的蛋白质语言模型进行基于基因组规模预测细菌 IV 型分泌效应子的生物信息学工具。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab420.
6
iT4SE-EP: Accurate Identification of Bacterial Type IV Secreted Effectors by Exploring Evolutionary Features from Two PSI-BLAST Profiles.iT4SE-EP:通过探索来自两个PSI-BLAST图谱的进化特征准确鉴定细菌IV型分泌效应蛋白
Molecules. 2021 Apr 24;26(9):2487. doi: 10.3390/molecules26092487.
7
PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method.PredT4SE-Stack:使用堆叠集成方法从蛋白质序列预测细菌IV型分泌效应蛋白
Front Microbiol. 2018 Oct 26;9:2571. doi: 10.3389/fmicb.2018.02571. eCollection 2018.
8
Interpretable machine learning for predicting 28-day all-cause in-hospital mortality for hypertensive ischemic or hemorrhagic stroke patients in the ICU: a multi-center retrospective cohort study with internal and external cross-validation.用于预测重症监护病房中高血压性缺血性或出血性中风患者28天全因院内死亡率的可解释机器学习:一项具有内部和外部交叉验证的多中心回顾性队列研究
Front Neurol. 2023 Aug 8;14:1185447. doi: 10.3389/fneur.2023.1185447. eCollection 2023.
9
Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery.基于卷积神经网络的细菌 IV 型分泌系统效应物注释,具有更高的准确性和更低的假阳性率。
Brief Bioinform. 2020 Sep 25;21(5):1825-1836. doi: 10.1093/bib/bbz120.
10
XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set.XGB-DrugPred:使用极端梯度提升和优化特征集的可药物蛋白计算预测。
Sci Rep. 2022 Apr 1;12(1):5505. doi: 10.1038/s41598-022-09484-3.

引用本文的文献

1
DeepAIPs-SFLA: Deep Convolutional Model for Prediction of Anti-Inflammatory Peptides Using Binary Pattern Decomposition of Novel Multiview Descriptors with an SFLA Approach.深度人工智能粒子群优化算法:基于新型多视图描述符的二元模式分解与粒子群优化算法的深度卷积模型用于抗炎肽预测
ACS Omega. 2025 Aug 5;10(32):35747-35762. doi: 10.1021/acsomega.5c02422. eCollection 2025 Aug 19.
2
pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network.pNPs-CapsNet:使用蛋白质语言模型和基于FastText编码的加权多视图特征集成与深度胶囊神经网络预测神经肽
ACS Omega. 2025 Mar 18;10(12):12403-12416. doi: 10.1021/acsomega.4c11449. eCollection 2025 Apr 1.
3

本文引用的文献

1
Prediction of hot spots in protein-DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting.基于有监督等距特征映射和极端梯度提升的蛋白质-DNA 结合界面热点预测。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):381. doi: 10.1186/s12859-020-03683-3.
2
Extremely-randomized-tree-based Prediction of N-Methyladenosine Sites in .基于极端随机树的……中N-甲基腺苷位点预测
Curr Genomics. 2020 Jan;21(1):26-33. doi: 10.2174/1389202921666200219125625.
3
From Local Explanations to Global Understanding with Explainable AI for Trees.
T4Seeker: a hybrid model for type IV secretion effectors identification.T4Seeker:一种用于 IV 型分泌效应器识别的混合模型。
BMC Biol. 2024 Nov 14;22(1):259. doi: 10.1186/s12915-024-02064-z.
4
Explainable artificial intelligence for omics data: a systematic mapping study.可解释人工智能在组学数据中的应用:系统综述研究。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad453.
5
UniKP: a unified framework for the prediction of enzyme kinetic parameters.UniKP:一种用于预测酶动力学参数的统一框架。
Nat Commun. 2023 Dec 11;14(1):8211. doi: 10.1038/s41467-023-44113-1.
6
Protein Sorting Prediction.蛋白质分拣预测。
Methods Mol Biol. 2024;2715:27-63. doi: 10.1007/978-1-0716-3445-5_2.
7
DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria.DeepSecE:一种基于深度学习的革兰氏阴性菌分泌蛋白多类预测框架。
Research (Wash D C). 2023 Oct 25;6:0258. doi: 10.34133/research.0258. eCollection 2023.
8
A method for identifying moonlighting proteins based on linear discriminant analysis and bagging-SVM.一种基于线性判别分析和装袋支持向量机的兼职蛋白识别方法。
Front Genet. 2022 Aug 15;13:963349. doi: 10.3389/fgene.2022.963349. eCollection 2022.
9
Automated Triage System for Intensive Care Admissions during the COVID-19 Pandemic Using Hybrid XGBoost-AHP Approach.基于混合 XGBoost-AHP 方法的 COVID-19 大流行期间重症监护入院的自动分诊系统。
Sensors (Basel). 2021 Sep 24;21(19):6379. doi: 10.3390/s21196379.
10
Analysis of Color Language and Aesthetic Paradigm of Print Art Based on GB-BP Neural Network.基于 GB-BP 神经网络的印刷艺术色彩语言与审美范式分析。
Comput Intell Neurosci. 2021 Aug 12;2021:4383092. doi: 10.1155/2021/4383092. eCollection 2021.
利用可解释人工智能实现从局部解释到树木的全局理解
Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.
4
PredCID: prediction of driver frameshift indels in human cancer.PredCID:人类癌症中驱动移码插入缺失的预测
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa119.
5
A multimodal deep learning framework for predicting drug-drug interaction events.一种用于预测药物-药物相互作用事件的多模态深度学习框架。
Bioinformatics. 2020 Aug 1;36(15):4316-4322. doi: 10.1093/bioinformatics/btaa501.
6
HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation.HLPpred-Fuse:通过融合多种特征表示提高和增强溶血肽及其活性的预测
Bioinformatics. 2020 Jun 1;36(11):3350-3356. doi: 10.1093/bioinformatics/btaa160.
7
EP3: an ensemble predictor that accurately identifies type III secreted effectors.EP3:一种能够准确识别 III 型分泌效应物的集成预测器。
Brief Bioinform. 2021 Mar 22;22(2):1918-1928. doi: 10.1093/bib/bbaa008.
8
SPVec: A Word2vec-Inspired Feature Representation Method for Drug-Target Interaction Prediction.SPVec:一种受词向量启发的用于药物-靶点相互作用预测的特征表示方法。
Front Chem. 2020 Jan 10;7:895. doi: 10.3389/fchem.2019.00895. eCollection 2019.
9
Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening.机器智能在肽类药物治疗学中的应用:一种用于快速疾病筛查的下一代工具。
Med Res Rev. 2020 Jul;40(4):1276-1314. doi: 10.1002/med.21658. Epub 2020 Jan 10.
10
DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features.DTI-CDF:一种基于混合特征的药物-靶标相互作用预测的级联深度森林模型。
Brief Bioinform. 2021 Jan 18;22(1):451-462. doi: 10.1093/bib/bbz152.