Gao JianZhao, Tao Xue-Wen, Zhao Jia, Feng Yuan-Ming, Cai Yu-Dong, Zhang Ning
School of Mathematical Sciences and LPMC, Nankai University, Tianjin. China.
Department of Biomedical Engineering, Tianjin Key Lab of Biomedical Engineering Measurement, Tianjin University, Tianjin. China.
Comb Chem High Throughput Screen. 2017;20(7):629-637. doi: 10.2174/1386207320666170314093216.
Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction.
Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed.
Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites.
We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation.
赖氨酸乙酰化作为一种翻译后修饰(PTM),在细胞调控中发挥关键作用,并可能涉及多种人类疾病。然而,使用传统实验方法鉴定赖氨酸乙酰化位点通常成本高昂且耗时。因此,应开发有效的计算方法来预测乙酰化位点。在本研究中,我们开发了一种用于ε-赖氨酸乙酰化位点预测的位置特异性方法。
从UniProt数据库中检索乙酰化蛋白质的序列。纳入了各种特征,如位置特异性评分矩阵(PSSM)、氨基酸因子(AAF)和无序性。采用了基于最大相关最小冗余(mRMR)和增量特征选择(IFS)的特征选择方法。
最终,从总共541个特征中选择了319个最优特征。使用这319个最优特征对肽段进行编码,基于袋装法构建了一个预测器。结果,准确率达到69.56%,马修斯相关系数(MCC)为0.2792。我们分析了最优特征,这揭示了一些决定赖氨酸乙酰化位点的重要因素。
我们开发了一种用于ε-赖氨酸乙酰化位点预测的位置特异性方法。选择了一组最优特征。对最优特征的分析为赖氨酸乙酰化位点的机制提供了见解,为实验验证提供了指导。