Genomics Research Center, Harbin Medical University, Harbin, China.
BMC Genomics. 2014 Jan 21;15:50. doi: 10.1186/1471-2164-15-50.
Many bacteria can deliver pathogenic proteins (effectors) through type IV secretion systems (T4SSs) to eukaryotic cytoplasm, causing host diseases. The inherent property, such as sequence diversity and global scattering throughout the whole genome, makes it a big challenge to effectively identify the full set of T4SS effectors. Therefore, an effective inter-species T4SS effector prediction tool is urgently needed to help discover new effectors in a variety of bacterial species, especially those with few known effectors, e.g., Helicobacter pylori.
In this research, we first manually annotated a full list of validated T4SS effectors from different bacteria and then carefully compared their C-terminal sequential and position-specific amino acid compositions, possible motifs and structural features. Based on the observed features, we set up several models to automatically recognize T4SS effectors. Three of the models performed strikingly better than the others and T4SEpre_Joint had the best performance, which could distinguish the T4SS effectors from non-effectors with a 5-fold cross-validation sensitivity of 89% at a specificity of 97%, based on the training datasets. An inter-species cross prediction showed that T4SEpre_Joint could recall most known effectors from a variety of species. The inter-species prediction tool package, T4SEpre, was further used to predict new T4SS effectors from H. pylori, an important human pathogen associated with gastritis, ulcer and cancer. In total, 24 new highly possible H. pylori T4S effector genes were computationally identified.
We conclude that T4SEpre, as an effective inter-species T4SS effector prediction software package, will help find new pathogenic T4SS effectors efficiently in a variety of pathogenic bacteria.
许多细菌可以通过 IV 型分泌系统(T4SS)将致病蛋白(效应器)递送至真核细胞质,从而引发宿主疾病。由于序列多样性和在整个基因组中广泛分布等固有特性,有效地识别全套 T4SS 效应器仍然是一个巨大的挑战。因此,迫切需要一种有效的种间 T4SS 效应器预测工具来帮助发现各种细菌中的新效应器,特别是那些已知效应器较少的细菌,例如幽门螺杆菌。
在这项研究中,我们首先手动注释了来自不同细菌的已验证 T4SS 效应器的完整列表,然后仔细比较了它们 C 末端序列和位置特异性氨基酸组成、可能的基序和结构特征。基于观察到的特征,我们建立了几个模型来自动识别 T4SS 效应器。其中三个模型的性能明显优于其他模型,而 T4SEpre_Joint 的性能最佳,在基于训练数据集的 5 倍交叉验证中,它可以将 T4SS 效应器与非效应器区分开来,敏感性为 89%,特异性为 97%。种间交叉预测表明,T4SEpre_Joint 可以从多种物种中召回大多数已知的效应器。进一步将种间预测工具包 T4SEpre 用于预测与胃炎、溃疡和癌症相关的重要人类病原体幽门螺杆菌中的新 T4SS 效应器。总共计算鉴定了 24 个新的高度可能的幽门螺杆菌 T4S 效应基因。
我们的结论是,T4SEpre 作为一种有效的种间 T4SS 效应器预测软件包,将有助于在各种致病菌中高效地发现新的致病 T4SS 效应器。