Suppr超能文献

CBDT-Oglyc:基于 ChiMIC 的平衡决策表和特征选择预测 O-糖基化位点。

CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection.

机构信息

School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, Hunan, P. R. China.

Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, Hunan, P. R. China.

出版信息

J Bioinform Comput Biol. 2023 Oct;21(5):2350024. doi: 10.1142/S0219720023500245. Epub 2023 Oct 28.

Abstract

O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.

摘要

O-糖基化(Oglyc)在各种生物过程中发挥着重要作用。理解 Oglyc 机制的关键是确定相应的糖基化位点。特征选择和分类器设计这两个关键步骤极大地影响了用于预测 Oglyc 位点的计算方法的准确性。基于有效的特征选择算法和能够处理不平衡数据集的分类器,提出了一种新的计算方法,基于 ChiMIC 的平衡决策表 O-糖基化(CBDT-Oglyc),用于预测蛋白质中的 Oglyc 位点。通过结合氨基酸组成(AAC)、无向[Formula: see text]-间隔氨基酸对组成(无向-CKSAAP)和伪位置特异性评分矩阵(PsePSSM)来进行序列特征描述。采用 Chi-MIC-share 算法进行特征选择,简化模型并提高预测准确性。对于不平衡分类,设计了基于局部卡方检验的回溯方法,然后结合代价敏感学习来构建一个名为基于 ChiMIC 的平衡决策表(CBDT)的新型分类器。基于 1:49(阳性:阴性)的训练集,CBDT 分类器在预测性能方面明显优于传统分类器。此外,对独立的人类和小鼠糖蛋白的独立测试结果表明,CBDT-Oglyc 在全局准确性方面优于以前的方法。CBDT-Oglyc 在预测 Oglyc 位点方面具有很大的潜力,有望促进蛋白质糖基化的进一步实验研究。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验