Suppr超能文献

利用支持向量机鉴定和分析组蛋白中的巴豆酰化位点。

Identify and analysis crotonylation sites in histone by using support vector machines.

机构信息

Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China; Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.

Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China.

出版信息

Artif Intell Med. 2017 Nov;83:75-81. doi: 10.1016/j.artmed.2017.02.007. Epub 2017 Mar 7.

Abstract

OBJECTIVE

Lysine crotonylation (Kcr) is a newly discovered histone posttranslational modification, which is specifically enriched at active gene promoters and potential enhancers in mammalian cell genomes. Although lysine crotonylation sites can be correctly identified with high-resolution mass spectrometry, the experimental methods are time-consuming and expensive. Therefore, it is necessary to develop computational methods to deal with this problem.

METHODS

We proposed a new encoding scheme named position weight amino acid composition to extract sequence information of histone around crotonylation sites. We chose protein data from Uniprot database. A series of steps were used to construct a strict and objective benchmark dataset for training and testing the proposed method. All samples were characterized by a significant number of features derived from position weight amino acid composition. The support vector machine was used to perform classification.

RESULTS

Based on a series of experiments, we found that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthew's correlation coefficient (MCC) were respectively 71.69%, 98.7%, 94.43%, and 0.778 in jackknife cross-validation. Comparison results demonstrated that our proposed model was better than random forest algorithm. We also performed the feature analysis on samples.

CONCLUSION

Identification of the Kcr sites in histone is an indispensable step for decoding protein function. Therefore, the method can promote the deep understanding of the physiological roles of crotonylation and provide useful information for developing drugs to treat various diseases associated with crotonylation.

摘要

目的

赖氨酸丁酰化(Kcr)是一种新发现的组蛋白翻译后修饰,它在哺乳动物细胞基因组中活跃的基因启动子和潜在的增强子处特异性富集。虽然赖氨酸丁酰化位点可以通过高分辨率质谱正确识别,但实验方法既耗时又昂贵。因此,有必要开发计算方法来解决这个问题。

方法

我们提出了一种新的编码方案,称为位置权重氨基酸组成,以提取组蛋白中丁酰化位点周围的序列信息。我们选择了来自 Uniprot 数据库的蛋白质数据。采用一系列步骤构建了一个严格的、客观的基准数据集,用于训练和测试所提出的方法。所有样本均具有从位置权重氨基酸组成中提取的大量特征。支持向量机用于进行分类。

结果

基于一系列实验,我们发现在 jackknife 交叉验证中,灵敏度(Sn)、特异性(Sp)、准确性(Acc)和马修相关系数(MCC)分别为 71.69%、98.7%、94.43%和 0.778。对比结果表明,我们提出的模型优于随机森林算法。我们还对样本进行了特征分析。

结论

鉴定组蛋白中的 Kcr 位点是解码蛋白质功能的不可或缺的步骤。因此,该方法可以促进对丁酰化生理作用的深入理解,并为开发治疗与丁酰化相关的各种疾病的药物提供有用信息。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验