Suppr超能文献

iDPGK:基于序列特征的赖氨酸磷酸甘油化位点的表征和鉴定。

iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features.

机构信息

Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan.

Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan.

出版信息

BMC Bioinformatics. 2020 Dec 9;21(1):568. doi: 10.1186/s12859-020-03916-5.

Abstract

BACKGROUND

Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites.

RESULT

We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites.

CONCLUSION

The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/ .

摘要

背景

蛋白质磷酸甘油化是指在蛋白质的赖氨酸残基上添加 1,3-二磷酸甘油酸(1,3-BPG),从而形成 3-磷酸甘油酰-赖氨酸,是一种可逆的非酶促翻译后修饰(PTM),在葡萄糖代谢和糖酵解过程中发挥调节作用。随着实验验证的磷酸甘油化位点数量的显著增加,统计或机器学习方法对于研究磷酸甘油化位点的特征至关重要。目前,磷酸甘油化的研究非常有限,只有少数资源可用于计算识别磷酸甘油化位点。

结果

我们基于序列特征对磷酸甘油化位点进行了生物信息学研究。TwoSampleLogo 分析表明,磷酸甘油化位点周围的区域含有较高的带正电荷的氨基酸,特别是在上游侧翼区域。此外,PTM-Logo 的结果表明,磷酸甘油化赖氨酸周围的非极性和脂肪族氨基酸更为丰富,这可能在区分磷酸甘油化和非磷酸甘油化位点方面发挥功能作用。在训练数据集上,采用了多种类型的特征来构建预测模型,包括氨基酸组成、氨基酸对组成、位置加权矩阵和位置特异性评分矩阵。此外,为了提高预测能力,我们考虑了按 F 分数排名的许多顶级特征作为最终的分类组合,然后使用 DT、RF 和 SVM 分类器对预测模型进行训练。五重交叉验证评估表明,所选特征在区分磷酸甘油化和非磷酸甘油化位点方面最为有效。

结论

使用所选基于序列的特征训练的 SVM 模型表现良好,其敏感性为 77.5%,特异性为 73.6%,准确性为 74.9%,马修斯相关系数值为 0.49。此外,该模型在独立测试集中也表现出一致的有效性,敏感性为 75.7%,特异性为 64.9%。最后,该模型已被实现为一个基于网络的系统,即 iDPGK,现在可以在 http://mer.hc.mmh.org.tw/iDPGK/ 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3ec/7727188/2239ff29b877/12859_2020_3916_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验