Yu Shaoyou, Peng Dejun, Zhu Wen, Liao Bo, Wang Peng, Yang Dongxuan, Wu Fangxiang
Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.
Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.
Front Pharmacol. 2022 Oct 10;13:1031759. doi: 10.3389/fphar.2022.1031759. eCollection 2022.
DNA-binding proteins (DBP) play an essential role in the genetics and evolution of organisms. A particular DNA sequence could provide underlying therapeutic benefits for hereditary diseases and cancers. Studying these proteins can timely and effectively understand their mechanistic analysis and play a particular function in disease prevention and treatment. The limitation of identifying DNA-binding protein members from the sequence database is time-consuming, costly, and ineffective. Therefore, efficient methods for improving DBP classification are crucial to disease research. In this paper, we developed a novel predictor Hybrid DBP, which identified potential DBP by using hybrid features and convolutional neural networks. The method combines two feature selection methods, MonoDiKGap and Kmer, and then used MRMD2.0 to remove redundant features. According to the results, 94% of DBP were correctly recognized, and the accuracy of the independent test set reached 91.2%. This means Hybrid DBP can become a useful prediction tool for predicting DBP.
DNA结合蛋白(DBP)在生物体的遗传和进化中起着至关重要的作用。特定的DNA序列可能为遗传性疾病和癌症提供潜在的治疗益处。研究这些蛋白质可以及时有效地了解其机制分析,并在疾病预防和治疗中发挥特定作用。从序列数据库中识别DNA结合蛋白成员的局限性在于耗时、成本高且效率低。因此,改进DBP分类的有效方法对疾病研究至关重要。在本文中,我们开发了一种新型预测器Hybrid_DBP,它通过使用混合特征和卷积神经网络来识别潜在的DBP。该方法结合了两种特征选择方法,即单核苷酸二核苷酸间隙(MonoDiKGap)和k-mer,然后使用MRMD2.0去除冗余特征。根据结果,94%的DBP被正确识别,独立测试集的准确率达到91.2%。这意味着Hybrid_DBP可以成为预测DBP的有用预测工具。