Suppr超能文献

脂质支持向量机:基于支持向量机的蛋白质赖氨酸脂酰化预测

LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine.

作者信息

Wu Meiqi, Lu Pengchao, Yang Yingxi, Liu Liwen, Wang Hui, Xu Yan, Chu Jixun

机构信息

Department of Applied Mathematics, University of Science and Technology Beijing, Beijing 100083, China.

Equipment Leasing Company of China Petroleum Pipeline Engineering Co., Ltd. 065000 Langfang City, Hebei Province, China.

出版信息

Curr Genomics. 2019 Aug;20(5):362-370. doi: 10.2174/1389202919666191014092843.

Abstract

BACKGROUND

Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites.

METHODOLOGY

In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets.

RESULTS

By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences.

CONCLUSION

A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.

摘要

背景

赖氨酸脂酰化是一种罕见且高度保守的蛋白质翻译后修饰,被认为是生物学领域最重要的过程之一。为了全面了解赖氨酸脂酰化的调控机制,关键在于识别赖氨酸脂酰化位点。实验方法昂贵且费力。由于实验方法成本高且复杂,迫切需要开发计算方法来预测脂酰化位点。

方法

在这项工作中,开发了一种名为LipoSVM的预测器来准确预测脂酰化位点。为了克服样本不平衡的问题,采用合成少数过采样技术(SMOTE)来平衡阴性和阳性样本。此外,选择不同比例的阳性和阴性样本作为训练集。

结果

通过比较五种不同的编码方案和五种分类算法,最终使用阳性和阴性样本比例为1:1的训练集,结合位置特异性评分矩阵和支持向量机构建了LipoSVM。在10折交叉验证中,最佳性能实现了99.98%的准确率和0.9996的AUC。独立测试集的AUC达到0.9997,这证明了LipoSVM的稳健性。赖氨酸脂酰化片段与非脂酰化片段之间的分析显示出显著的统计学差异。

结论

基于位置特异性评分矩阵和支持向量机构建了一个良好的赖氨酸脂酰化预测器。同时,可以从https://github.com/stars20180811/LipoSVM免费下载在线网络服务器LipoSVM。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a47/7235397/178b7c5e4f38/CG-20-362_F1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验