School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China.
Comput Biol Med. 2022 Oct;149:105940. doi: 10.1016/j.compbiomed.2022.105940. Epub 2022 Aug 13.
Proteins interact with nucleic acids to regulate the life activities of organisms. Therefore, how to accurately and efficiently identify nucleic acid-binding proteins (NABPs) is particularly significant. Some sequence-based computational methods have been proposed to identify DNA- and RNA-binding proteins in previous studies. However, the benchmark datasets used by these methods ignore the proportion of NABPs in the real world, and some integration methods only integrate traditional machine learning algorithms, resulting in limited prediction performance. In this study, we proposed a sequence-based method called iDRBP-ECHF to predict the DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs). We constructed a benchmark dataset by considering the proportion of positive and negative samples in the real world, and used down-sampling to generate three relatively balanced datasets to train the iDRBP-ECHF. In addition, we incorporated the deep learning algorithms into the framework to obtain a more compact high-level feature representation of the input data. The results on two independent datasets show that it achieves the most advanced performance and is superior to the other existing sequence-based DBP and RBP prediction methods. In addition, we set up a webserver iDRBP-ECHF, which can be accessed at http://bliulab.net/iDRBP-ECHF.
蛋白质与核酸相互作用以调节生物的生命活动。因此,如何准确有效地识别核酸结合蛋白(NABP)尤为重要。在之前的研究中,已经提出了一些基于序列的计算方法来识别 DNA 和 RNA 结合蛋白。然而,这些方法使用的基准数据集忽略了 NABP 在现实世界中的比例,并且一些集成方法仅集成传统的机器学习算法,导致预测性能有限。在这项研究中,我们提出了一种基于序列的方法 iDRBP-ECHF,用于预测 DNA 结合蛋白 (DBP) 和 RNA 结合蛋白 (RBP)。我们通过考虑真实世界中正负样本的比例构建了一个基准数据集,并使用下采样生成了三个相对平衡的数据集来训练 iDRBP-ECHF。此外,我们将深度学习算法纳入框架中,以获得输入数据的更紧凑的高级特征表示。在两个独立数据集上的结果表明,它实现了最先进的性能,优于其他现有的基于序列的 DBP 和 RBP 预测方法。此外,我们建立了一个名为 iDRBP-ECHF 的网络服务器,可通过 http://bliulab.net/iDRBP-ECHF 访问。