Suppr超能文献

T3SEpp:一种用于细菌III型分泌效应蛋白的综合预测流程

T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors.

作者信息

Hui Xinjie, Chen Zewei, Lin Mingxiong, Zhang Junya, Hu Yueming, Zeng Yingying, Cheng Xi, Ou-Yang Le, Sun Ming-An, White Aaron P, Wang Yejun

机构信息

Department of Cell Biology and Genetics, School of Basic Medicine, Shenzhen University Health Science, Shenzhen, China.

Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, College of Information Engineering, Shenzhen University, Shenzhen, China.

出版信息

mSystems. 2020 Aug 4;5(4):e00288-20. doi: 10.1128/mSystems.00288-20.

Abstract

Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.

摘要

许多革兰氏阴性菌通过将多种III型分泌效应蛋白(T3SEs)转运到宿主细胞质中来感染宿主并引发疾病。然而,尽管可用的全基因组序列数量大幅增加,但准确预测T3SEs仍然具有挑战性。传统的预测模型侧重于T3SEs N端肽中隐藏的非典型序列特征,但不幸的是,这些模型的假阳性率很高。在本研究中,我们整合了启动子信息以及信号区、伴侣结合结构域和效应结构域的特征蛋白特征,用于T3SE预测。采用包括深度学习在内的机器学习算法来预测主要隐藏在T3SEs信号序列中的非典型特征,随后开发了一种基于投票的集成模型,将各个预测结果整合在一起。我们将其组装成一个统一的T3SE预测管道T3SEpp,该管道整合了各个模块的结果,与最先进的软件工具相比,具有较高的准确性(即~0.94)且假阳性率降低了1倍以上。这里观察到的T3SEpp管道和序列特征将有助于准确鉴定新的T3SEs,对未来宿主-病原体相互作用的研究有诸多益处。III型分泌效应蛋白(T3SE)预测仍然是一个巨大的计算挑战。在实际应用中,当前的软件工具常常存在假阳性率高的问题。其中一个原因可能是用于模型设计和训练的生物特征类型相对单一。在本研究中,我们对T3SEs基于序列的特征进行了全面调查,包括信号序列、伴侣结合结构域、效应结构域和转录因子结合启动子位点,并在基于同源性和多种机器学习模型中组装了一个整合多方面生物特征的统一预测管道。据我们所知,我们在本研究中对T3SEs进行了最全面的生物序列特征分析。整合多种特征并组装不同模型的T3SEpp管道显示出较高的准确性,这应该有助于在新的和现有的细菌全基因组序列中更准确地鉴定T3SEs。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d718/7406222/5dc5f2922522/mSystems.00288-20-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验