College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453000, China.
Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China.
Plant Mol Biol. 2022 Sep;110(1-2):81-92. doi: 10.1007/s11103-022-01288-3. Epub 2022 Jul 1.
We makes three kinds of important features from Arabidopsis thaliana: protein secondary structure based on the Chou-Fasman parameter, amino acids hydrophobicity and polarity information, and analyze their properties. Ubiquitination modification is an important post-translational modification of proteins, which participates in the regulation of many important life activities in cells. At present, ubiquitination proteomics research is mostly concentrated in animals and yeasts, while relatively few studies have been carried out in plants. It can be said that the calculation and prediction of Arabidopsis thaliana ubiquitination sites is still in its infancy. Based on this, we describe a calculation method, PseAraUbi (Prediction of Arabidopsis thaliana ubiquitination sites using pseudo amino acid composition), that can effectively detect ubiquitination sites on Arabidopsis thaliana using support vector machine learning classifiers. Based on protein sequence information, extract features from the Chou-Fasman parameter, amino acids hydrophobicity features, polarity information and selected for classification with the Boruta algorithm. PseAraUbi achieves promising performances with an AUC score of 0.953 with fivefold cross-validation on the training dataset, which are significantly better than that of the pioneer Arabidopsis thaliana ubiquitination sites method. We also proved the ability of our proposed method on independent test sets, thus gaining a competitive advantage. In addition, we also in-depth analyzed the physicochemical properties of amino acids in the region adjacent to the ubiquitination site. To facilitate the community, the source code, optimal feature subset, ubiquitination sites dataset in the Arbidopsis proteome are available at GitHub ( https://github.com/HNUBioinformatics/PseAraUbi.git ) for interest users.
基于 Chou-Fasman 参数的蛋白质二级结构、氨基酸疏水性和极性信息,并分析了它们的性质。泛素化修饰是蛋白质的一种重要的翻译后修饰,参与细胞中许多重要生命活动的调节。目前,泛素化蛋白质组学研究主要集中在动物和酵母中,而在植物中相对较少。可以说,拟南芥泛素化位点的计算和预测仍处于起步阶段。基于此,我们描述了一种计算方法 PseAraUbi(使用伪氨基酸组成预测拟南芥泛素化位点),该方法可以有效地使用支持向量机学习分类器检测拟南芥中的泛素化位点。基于蛋白质序列信息,从 Chou-Fasman 参数、氨基酸疏水性特征、极性信息中提取特征,并使用 Boruta 算法进行分类。PseAraUbi 在训练数据集上的五重交叉验证中获得了 0.953 的 AUC 评分,这明显优于先驱的拟南芥泛素化位点方法。我们还在独立测试集上证明了我们提出的方法的能力,从而获得了竞争优势。此外,我们还深入分析了泛素化位点附近区域氨基酸的理化性质。为了方便社区,源代码、最优特征子集、拟南芥蛋白质组中的泛素化位点数据集可在 GitHub(https://github.com/HNUBioinformatics/PseAraUbi.git)上获得,供有兴趣的用户使用。