Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan.
PLoS One. 2011 Mar 9;6(3):e17331. doi: 10.1371/journal.pone.0017331.
Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (-20∼+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub sites can improve predictive performance. Additionally, the independent test demonstrates that the proposed method can outperform other ubiquitylation prediction tools.
泛素(Ub)是一种由 76 个氨基酸组成的小蛋白,分子量约为 8.5 kDa。在泛素缀合中,Ub 主要通过 Ub 连接(E3)酶在蛋白质的赖氨酸残基上进行缀合。三种主要的酶参与泛素缀合。它们是 E1、E2 和 E3,分别负责激活、连接和连接泛素。真核生物中的泛素缀合是蛋白酶体介导的蛋白质降解和调节转录因子活性的重要机制。鉴于泛素缀合在生物过程中的重要性,本研究开发了一种方法 UbSite,该方法利用有效的径向基函数(RBF)网络来识别蛋白质泛素缀合(泛素化)位点。这项工作不仅研究了氨基酸组成,还研究了氨基酸的结构特征、物理化学性质和进化信息。参考泛素缀合途径,研究了远离泛素化位点的 E3 识别的底物位点。在较大窗口大小(-20∼+20)下测量 F 分数,揭示了具有统计学意义的氨基酸组成和位置特异性评分矩阵(进化信息),这些信息主要位于远离 Ub 位点的位置。这些远程信息可有效用于区分 Ub 位点和非 Ub 位点。通过五倍交叉验证确定,使用氨基酸组成和进化信息组合训练的模型在识别泛素化位点方面表现最佳。预测敏感性、特异性和准确性分别为 65.5%、74.8%和 74.5%。尽管泛素化位点周围的氨基酸序列不包含保守基序,但交叉验证结果表明,整合 Ub 位点的远程序列特征可以提高预测性能。此外,独立测试表明,所提出的方法可以优于其他泛素化预测工具。