School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
Huajian Yutong Technology (Beijing) Co., Ltd.
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad251.
Nucleic acid-binding proteins are proteins that interact with DNA and RNA to regulate gene expression and transcriptional control. The pathogenesis of many human diseases is related to abnormal gene expression. Therefore, recognizing nucleic acid-binding proteins accurately and efficiently has important implications for disease research. To address this question, some scientists have proposed the method of using sequence information to identify nucleic acid-binding proteins. However, different types of nucleic acid-binding proteins have different subfunctions, and these methods ignore their internal differences, so the performance of the predictor can be further improved. In this study, we proposed a new method, called iDRPro-SC, to predict the type of nucleic acid-binding proteins based on the sequence information. iDRPro-SC considers the internal differences of nucleic acid-binding proteins and combines their subfunctions to build a complete dataset. Additionally, we used an ensemble learning to characterize and predict nucleic acid-binding proteins. The results of the test dataset showed that iDRPro-SC achieved the best prediction performance and was superior to the other existing nucleic acid-binding protein prediction methods. We have established a web server that can be accessed online: http://bliulab.net/iDRPro-SC.
核酸结合蛋白是与 DNA 和 RNA 相互作用以调节基因表达和转录控制的蛋白质。许多人类疾病的发病机制都与异常基因表达有关。因此,准确有效地识别核酸结合蛋白对于疾病研究具有重要意义。为了解决这个问题,一些科学家提出了利用序列信息识别核酸结合蛋白的方法。然而,不同类型的核酸结合蛋白具有不同的亚功能,这些方法忽略了它们的内部差异,因此预测器的性能可以进一步提高。在这项研究中,我们提出了一种新的方法,称为 iDRPro-SC,它基于序列信息来预测核酸结合蛋白的类型。iDRPro-SC 考虑了核酸结合蛋白的内部差异,并结合它们的亚功能来构建完整的数据集。此外,我们使用集成学习来对核酸结合蛋白进行特征化和预测。测试数据集的结果表明,iDRPro-SC 达到了最佳的预测性能,优于其他现有的核酸结合蛋白预测方法。我们已经建立了一个在线的网络服务器:http://bliulab.net/iDRPro-SC。