School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
BMC Bioinformatics. 2019 Jun 17;20(1):346. doi: 10.1186/s12859-019-2938-7.
Acetylation on lysine is a widespread post-translational modification which is reversible and plays a crucial role in some biological activities. To better understand the mechanism, it is necessary to identify acetylation sites in proteins accurately. Computational methods are popular because they are more convenient and faster than experimental methods. In this study, we proposed a new computational method to predict acetylation sites in human by combining sequence features and structural features including physicochemical property (PCP), position specific score matrix (PSSM), auto covariation (AC), residue composition (RC), secondary structure (SS) and accessible surface area (ASA), which can well characterize the information of acetylated lysine sites. Besides, a two-step feature selection was applied, which combined mRMR and IFS. It finally trained a cascade classifier based on SVM, which successfully solved the imbalance between positive samples and negative samples and covered all negative sample information.
The performance of this method is measured with a specificity of 72.19% and a sensibility of 76.71% on independent dataset which shows that a cascade SVM classifier outperforms single SVM classifier.
In addition to the analysis of experimental results, we also made a systematic and comprehensive analysis of the acetylation data.
赖氨酸乙酰化是一种广泛存在的翻译后修饰,具有可逆性,在某些生物活性中起着至关重要的作用。为了更好地理解其机制,有必要准确识别蛋白质中的乙酰化位点。由于计算方法比实验方法更方便、更快,因此很受欢迎。在这项研究中,我们提出了一种新的计算方法,通过结合序列特征和结构特征,包括理化性质(PCP)、位置特异性评分矩阵(PSSM)、自协变(AC)、残基组成(RC)、二级结构(SS)和可及表面积(ASA),来预测人类蛋白质中的乙酰化位点。此外,应用了两步特征选择,结合了 mRMR 和 IFS。最后,基于 SVM 训练了级联分类器,成功解决了正负样本之间的不平衡问题,并涵盖了所有负样本信息。
在独立数据集上,该方法的特异性为 72.19%,敏感性为 76.71%,这表明级联 SVM 分类器优于单个 SVM 分类器。
除了对实验结果进行分析外,我们还对乙酰化数据进行了系统和全面的分析。