Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
核酸结合蛋白(NABPs),包括 DNA 结合蛋白(DBPs)和 RNA 结合蛋白(RBPs),在重要的生物过程中发挥着重要作用。为了促进不同类型 NABPs 的功能注释和准确预测,已经开发了许多基于机器学习的计算方法。然而,这些研究中用于训练和测试的数据集以及预测范围限制了它们的应用。在本文中,我们通过生成更准确和稳健的数据集以及开发基于深度学习的方法(包括层次和多类方法)来克服这些限制,该方法可用于预测任何给定蛋白质的 NABP 类型。深度学习模型采用两层卷积神经网络和一层长短时记忆网络。我们的方法在平衡预测 DBPs 和 RBPs 方面优于现有的 DBP 和 RBP 预测器,并且在识别新型 NABPs 方面更具实际用途。多类方法极大地提高了 DBPs 和 RBPs 的预测准确性,特别是对于 DBPs 提高了约 12%。此外,我们还探索了单链 DNA 结合蛋白的预测准确性及其对 NABP 预测整体预测准确性的影响。