通过分析实例硬度和特征重要性来计算识别多个赖氨酸 PTM 位点。

Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance.

机构信息

Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204, Bangladesh.

Computer Science and Engineering, University of Rajshahi, Rajshahi, 6205, Bangladesh.

出版信息

Sci Rep. 2021 Sep 23;11(1):18882. doi: 10.1038/s41598-021-98458-y.

DOI:10.1038/s41598-021-98458-y

PMID:34556767

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8460736/

Abstract

Identification of post-translational modifications (PTM) is significant in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Though there are several computational tools to identify individual PTMs, only three predictors have been established to predict multiple PTMs at the same lysine residue. Furthermore, detailed analysis and assessment on dataset balancing and the significance of different feature encoding techniques for a suitable multi-PTM prediction model are still lacking. This study introduces a computational method named 'iMul-kSite' for predicting acetylation, crotonylation, methylation, succinylation, and glutarylation, from an unrecognized peptide sample with one, multiple, or no modifications. After successfully eliminating the redundant data samples from the majority class by analyzing the hardness of the sequence-coupling information, feature representation has been optimized by adopting the combination of ANOVA F-Test and incremental feature selection approach. The proposed predictor predicts multi-label PTM sites with 92.83% accuracy using the top 100 features. It has also achieved a 93.36% aiming rate and 96.23% coverage rate, which are much better than the existing state-of-the-art predictors on the validation test. This performance indicates that 'iMul-kSite' can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, 'iMul-kSite' has been deployed as a user-friendly web-server at http://103.99.176.239/iMul-kSite .

摘要

鉴定翻译后修饰（PTM）在计算蛋白质组学、细胞生物学、发病机制和药物开发的研究中具有重要意义，因为它在许多生物分子机制中发挥作用。尽管有几种计算工具可以识别单个 PTM，但仅建立了三个预测器来预测同一赖氨酸残基上的多个 PTM。此外，对于合适的多 PTM 预测模型，在数据集平衡和不同特征编码技术的重要性方面，仍然缺乏详细的分析和评估。本研究介绍了一种名为'iMul-kSite'的计算方法，用于从一个未识别的肽样本中预测乙酰化、巴豆酰化、甲基化、琥珀酰化和戊二酰化，该样本具有一个、多个或没有修饰。通过分析序列耦合信息的硬度，成功地从多数类中消除了冗余数据样本后，采用方差分析 F 检验和增量特征选择方法的组合优化了特征表示。该预测器使用前 100 个特征以 92.83%的准确率预测多标签 PTM 位点。它还实现了 93.36%的靶向率和 96.23%的覆盖率，这明显优于验证测试中现有的最先进的预测器。这种性能表明'iMul-kSite'可以作为进一步 K-PTM 研究的辅助工具。为了方便实验科学家，'iMul-kSite'已作为一个用户友好的网络服务器部署在 http://103.99.176.239/iMul-kSite 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过分析实例硬度和特征重要性来计算识别多个赖氨酸 PTM 位点。

Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过分析实例硬度和特征重要性来计算识别多个赖氨酸 PTM 位点。

Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献