Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, School of Pharmacy, Xuzhou Medical University, Xuzhou, 221000, Jiangsu, China.
BMC Genomics. 2020 Nov 23;21(1):812. doi: 10.1186/s12864-020-07166-w.
Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs.
In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively.
Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec , together with the data sets used in this study.
丙二酰化是一种新发现的翻译后修饰,与 2 型糖尿病和各种癌症等多种疾病有关。与丙二酰化位点的实验鉴定相比,计算方法是一种具有时间效益且成本相对较低的过程。
在这项研究中,我们通过主成分分析和支持向量机的结合,提出了一种名为 Mal-Prec(丙二酰化预测)的新型计算模型,用于丙二酰化位点预测。首先进行了独热编码、理化性质和 k 间隔酸对组成,以提取序列特征。然后应用 PCA 选择最佳特征子集,同时使用 SVM 预测丙二酰化位点。五重交叉验证结果表明,与其他方法相比,Mal-Prec 可以实现更好的预测性能。在独立数据集的五重交叉验证中,AUC(接受者操作特征曲线下的面积)分析分别达到 96.47%和 90.72%。
Mal-Prec 是一种用于识别蛋白质序列中丙二酰化位点的计算可靠方法。它优于现有的预测工具,可作为识别和发现人类蛋白质中新型丙二酰化位点的有用工具。Mal-Prec 是用 MATLAB 编写的,可在 https://github.com/flyinsky6/Mal-Prec 上公开获取,同时提供了本研究中使用的数据集。