Qiu Wangren, Xu Chunhui, Xiao Xuan, Xu Dong
Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333046, China.
Informatics Institute, University of Missouri, Columbia, MO 65201, USA.
Curr Genomics. 2019 Aug;20(5):389-399. doi: 10.2174/1389202919666191014091250.
Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms.
To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites.
In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization.
Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew's correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew's correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available.
Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at: https://github.com/Chunhuixu/UBIPredic_QWRCHX.
泛素化作为一种翻译后修饰,是细胞信号传导、细胞凋亡和定位过程中的关键生物学过程。泛素化蛋白的鉴定对于理解生物系统和疾病中的分子机制至关重要。尽管使用质谱的高通量实验研究已经鉴定出许多泛素化蛋白和泛素化位点,但即使在研究充分的模式生物中,绝大多数泛素化蛋白仍未被发现。
为降低实验成本,已引入计算方法来预测泛素化位点,但准确性并不理想。如果能够预测一种蛋白质是否会被泛素化,将有助于预测泛素化位点。然而,迄今为止所有的计算方法都只能预测泛素化位点。
在本研究中,开发了第一种不依赖泛素化位点预测来预测泛素化蛋白的计算方法。该方法通过灰色系统模型从序列保守信息中提取特征,以及功能域注释和亚细胞定位。
结合特征分析和 Relief 特征选择算法的应用,在三个数据集上进行的 5 折交叉验证结果达到了 90.13%的高精度,马修斯相关系数为 80.34%。在独立测试数据上的预测结果准确率达到 87.71%,马修斯相关系数为 75.43%,优于现有最佳泛素化位点预测工具的预测结果。
我们的研究可能会指导实验设计,并为研究泛素化途径的机制和调控提供有用的见解。代码可在以下网址获取:https://github.com/Chunhuixu/UBIPredic_QWRCHX。