Genomics Research Center, Haerbin Medical University, Harbin, China.
PLoS One. 2013;8(3):e58173. doi: 10.1371/journal.pone.0058173. Epub 2013 Mar 5.
Type III Secretion Systems (T3SSs) play important roles in the interaction between gram-negative bacteria and their hosts. T3SSs function by translocating a group of bacterial effector proteins into the host cytoplasm. The details of specific type III secretion process are yet to be clarified. This research focused on comparing the amino acid composition within the N-terminal 100 amino acids from type III secretion (T3S) signal sequences or non-T3S proteins, specifically whether each residue exerts a constraint on residues found in adjacent positions. We used these comparisons to set up a statistic model to quantitatively model and effectively distinguish T3S effectors.
In this study, the amino acid composition (Aac) probability profiles conditional on its sequentially preceding position and corresponding amino acids were compared between N-terminal sequences of T3S and non-T3S proteins. The profiles are generally different. A Markov model, namely T3_MM, was consequently designed to calculate the total Aac conditional probability difference, i.e., the likelihood ratio of a sequence being a T3S or a non-T3S protein. With T3_MM, known T3S and non-T3S proteins were found to well approximate two distinct normal distributions. The model could distinguish validated T3S and non-T3S proteins with a 5-fold cross-validation sensitivity of 83.9% at a specificity of 90.3%. T3_MM was also shown to be more robust, accurate, simple, and statistically quantitative, when compared with other T3S protein prediction models. The high effectiveness of T3_MM also indicated the overall Aac difference between N-termini of T3S and non-T3S proteins, and the constraint of Aac exerted by its preceding position and corresponding Aac.
An R package for T3_MM is freely downloadable from: http://biocomputer.bio.cuhk.edu.hk/softwares/T3_MM. T3_MM web server: http://biocomputer.bio.cuhk.edu.hk/T3DB/T3_MM.php.
III 型分泌系统(T3SS)在革兰氏阴性细菌与其宿主相互作用中发挥着重要作用。T3SS 通过将一组细菌效应蛋白易位到宿主细胞质中来发挥作用。特定的 III 型分泌过程的细节尚未阐明。本研究专注于比较 III 型分泌(T3S)信号序列或非 T3S 蛋白的 N 端前 100 个氨基酸内的氨基酸组成,特别是每个残基是否对相邻位置的残基施加约束。我们使用这些比较来建立一个统计模型,以定量建模和有效区分 T3S 效应子。
在这项研究中,比较了 T3S 和非 T3S 蛋白的 N 端序列中前序位置和相应氨基酸条件下的氨基酸组成(Aac)概率分布。这些分布通常是不同的。随后设计了一个马尔可夫模型,即 T3_MM,用于计算总 Aac 条件概率差异,即序列是 T3S 或非 T3S 蛋白的似然比。使用 T3_MM,发现已知的 T3S 和非 T3S 蛋白很好地近似于两个不同的正态分布。该模型在特异性为 90.3%的情况下,通过 5 倍交叉验证,对验证的 T3S 和非 T3S 蛋白的敏感性为 83.9%。与其他 T3S 蛋白预测模型相比,T3_MM 还表现出更稳健、准确、简单和统计定量的特点。T3_MM 的高效性也表明了 T3S 和非 T3S 蛋白 N 端之间的整体 Aac 差异,以及前序位置和相应 Aac 对 Aac 的约束。
T3_MM 的 R 包可从以下网址免费下载:http://biocomputer.bio.cuhk.edu.hk/softwares/T3_MM。T3_MM 网络服务器:http://biocomputer.bio.cuhk.edu.hk/T3DB/T3_MM.php。