基于组成-转换-分布特征的蛋白质精氨酸甲基化的计算预测
Computational Prediction of Protein Arginine Methylation Based on Composition-Transition-Distribution Features.
作者信息
Hou Ruiyan, Wu Jin, Xu Lei, Zou Quan, Wu Yi-Jun
机构信息
Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.
College of Life Science, University of Chinese Academy of Sciences, Beijing 100049, China.
出版信息
ACS Omega. 2020 Oct 19;5(42):27470-27479. doi: 10.1021/acsomega.0c03972. eCollection 2020 Oct 27.
Arginine methylation is one of the most essential protein post-translational modifications. Identifying the site of arginine methylation is a critical problem in biology research. Unfortunately, biological experiments such as mass spectrometry are expensive and time-consuming. Hence, predicting arginine methylation by machine learning is an alternative fast and efficient way. In this paper, we focus on the systematic characterization of arginine methylation with composition-transition-distribution (CTD) features. The presented framework consists of three stages. In the first stage, we extract CTD features from 1750 samples and exploit decision tree to generate accurate prediction. The accuracy of prediction can reach 96%. In the second stage, the support vector machine can predict the number of arginine methylation sites with 0.36 -squared. In the third stage, experiments carried out with the updated arginine methylation site data set show that utilizing CTD features and adopting random forest as the classifier outperform previous methods. The accuracy of identification can reach 82.1 and 82.5% in single methylarginine and double methylarginine data sets, respectively. The discovery presented in this paper can be helpful for future research on arginine methylation.
精氨酸甲基化是最重要的蛋白质翻译后修饰之一。确定精氨酸甲基化位点是生物学研究中的一个关键问题。不幸的是,诸如质谱分析等生物学实验既昂贵又耗时。因此,通过机器学习预测精氨酸甲基化是一种快速有效的替代方法。在本文中,我们专注于利用组成-转换-分布(CTD)特征对精氨酸甲基化进行系统表征。所提出的框架包括三个阶段。在第一阶段,我们从1750个样本中提取CTD特征,并利用决策树生成准确的预测。预测准确率可达96%。在第二阶段,支持向量机能够以0.36的平方预测精氨酸甲基化位点的数量。在第三阶段,使用更新后的精氨酸甲基化位点数据集进行的实验表明,利用CTD特征并采用随机森林作为分类器优于先前的方法。在单甲基精氨酸和双甲基精氨酸数据集中,识别准确率分别可达82.1%和82.5%。本文的发现有助于未来对精氨酸甲基化的研究。
相似文献
Protein Pept Lett. 2013-8
Brief Funct Genomics. 2023-1-20
Brief Funct Genomics. 2024-7-19
Protein Pept Lett. 2013-1
引用本文的文献
Bioinformatics. 2024-11-1
Methods Mol Biol. 2022
本文引用的文献
Nucleic Acids Res. 2020-7-2
Nucleic Acids Res. 2020-1-8
IEEE Trans Cybern. 2019-9-23
Mol Ther Nucleic Acids. 2019-9-6