Suppr超能文献

iDRBP-ECHF:基于可扩展立方混合框架的 DNA 和 RNA 结合蛋白识别。

iDRBP-ECHF: Identifying DNA- and RNA-binding proteins based on extensible cubic hybrid framework.

机构信息

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China.

出版信息

Comput Biol Med. 2022 Oct;149:105940. doi: 10.1016/j.compbiomed.2022.105940. Epub 2022 Aug 13.

Abstract

Proteins interact with nucleic acids to regulate the life activities of organisms. Therefore, how to accurately and efficiently identify nucleic acid-binding proteins (NABPs) is particularly significant. Some sequence-based computational methods have been proposed to identify DNA- and RNA-binding proteins in previous studies. However, the benchmark datasets used by these methods ignore the proportion of NABPs in the real world, and some integration methods only integrate traditional machine learning algorithms, resulting in limited prediction performance. In this study, we proposed a sequence-based method called iDRBP-ECHF to predict the DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs). We constructed a benchmark dataset by considering the proportion of positive and negative samples in the real world, and used down-sampling to generate three relatively balanced datasets to train the iDRBP-ECHF. In addition, we incorporated the deep learning algorithms into the framework to obtain a more compact high-level feature representation of the input data. The results on two independent datasets show that it achieves the most advanced performance and is superior to the other existing sequence-based DBP and RBP prediction methods. In addition, we set up a webserver iDRBP-ECHF, which can be accessed at http://bliulab.net/iDRBP-ECHF.

摘要

蛋白质与核酸相互作用以调节生物的生命活动。因此,如何准确有效地识别核酸结合蛋白(NABP)尤为重要。在之前的研究中,已经提出了一些基于序列的计算方法来识别 DNA 和 RNA 结合蛋白。然而,这些方法使用的基准数据集忽略了 NABP 在现实世界中的比例,并且一些集成方法仅集成传统的机器学习算法,导致预测性能有限。在这项研究中,我们提出了一种基于序列的方法 iDRBP-ECHF,用于预测 DNA 结合蛋白 (DBP) 和 RNA 结合蛋白 (RBP)。我们通过考虑真实世界中正负样本的比例构建了一个基准数据集,并使用下采样生成了三个相对平衡的数据集来训练 iDRBP-ECHF。此外,我们将深度学习算法纳入框架中,以获得输入数据的更紧凑的高级特征表示。在两个独立数据集上的结果表明,它实现了最先进的性能,优于其他现有的基于序列的 DBP 和 RBP 预测方法。此外,我们建立了一个名为 iDRBP-ECHF 的网络服务器,可通过 http://bliulab.net/iDRBP-ECHF 访问。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验