Suppr超能文献

PredDBP-Stack:基于堆叠集成方法的使用 HMM 轮廓预测 DNA 结合蛋白

PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.

机构信息

College of Information, Shanghai Ocean University, Shanghai 201306, China.

School of Engineering, University of Melbourne, Victoria 3010, Australia.

出版信息

Biomed Res Int. 2020 Apr 13;2020:7297631. doi: 10.1155/2020/7297631. eCollection 2020.

Abstract

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.

摘要

DNA 结合蛋白(DBP)在遗传活动的各个方面都起着至关重要的作用。然而,通过使用湿实验室实验方法来识别 DBP 通常既耗时又费力。在这项研究中,我们开发了一种名为 PredDBP-Stack 的新型计算方法,仅基于蛋白质序列即可预测 DBP。首先,从隐马尔可夫模型(HMM)谱中提取的氨基酸组成(AAC)和转移概率组成(TPC)用于表示蛋白质。接下来,我们建立了一个堆叠集成模型来识别 DBP,该模型涉及两个学习阶段。在第一阶段,使用基于 HMM 的组成特征训练四个基本分类器。在第二阶段,将这些基本分类器的预测概率用作元分类器的输入,以对 DBP 进行最终预测。在 PDB1075 基准数据集上,我们使用提出的 PredDBP-Stack 预测器进行了 jackknife 交叉验证,得到了平衡的灵敏度和特异性分别为 92.47%和 92.36%。这一结果优于大多数现有的分类器。此外,我们的方法在 PDB186 独立数据集上也表现出了卓越的性能和模型稳健性。这表明,PredDBP-Stack 是一种有效的分类器,可仅基于蛋白质序列信息准确识别 DBP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36c9/7174956/e533646f649d/BMRI2020-7297631.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验