School of Medicine, Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen, Guangdong Province 518172, China.
Warshel Institute for Computational Biology, School of Medicine, Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen, Guangdong Province 518172, China.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad138.
Cyclic AMP receptor proteins (CRPs) are important transcription regulators in many species. The prediction of CRP-binding sites was mainly based on position-weighted matrixes (PWMs). Traditional prediction methods only considered known binding motifs, and their ability to discover inflexible binding patterns was limited. Thus, a novel CRP-binding site prediction model called CRPBSFinder was developed in this research, which combined the hidden Markov model, knowledge-based PWMs and structure-based binding affinity matrixes. We trained this model using validated CRP-binding data from Escherichia coli and evaluated it with computational and experimental methods. The result shows that the model not only can provide higher prediction performance than a classic method but also quantitatively indicates the binding affinity of transcription factor binding sites by prediction scores. The prediction result included not only the most knowns regulated genes but also 1089 novel CRP-regulated genes. The major regulatory roles of CRPs were divided into four classes: carbohydrate metabolism, organic acid metabolism, nitrogen compound metabolism and cellular transport. Several novel functions were also discovered, including heterocycle metabolic and response to stimulus. Based on the functional similarity of homologous CRPs, we applied the model to 35 other species. The prediction tool and the prediction results are online and are available at: https://awi.cuhk.edu.cn/∼CRPBSFinder.
环腺苷酸受体蛋白(CRPs)是许多物种中重要的转录调控因子。CRP 结合位点的预测主要基于位置权重矩阵(PWMs)。传统的预测方法仅考虑已知的结合基序,其发现不灵活的结合模式的能力有限。因此,本研究开发了一种名为 CRPBSFinder 的新型 CRP 结合位点预测模型,该模型结合了隐马尔可夫模型、基于知识的 PWM 和基于结构的结合亲和力矩阵。我们使用来自大肠杆菌的验证 CRP 结合数据来训练该模型,并通过计算和实验方法对其进行评估。结果表明,该模型不仅可以提供比经典方法更高的预测性能,还可以通过预测分数定量地指示转录因子结合位点的结合亲和力。预测结果不仅包括最知名的调控基因,还包括 1089 个新的 CRP 调控基因。CRPs 的主要调控作用分为四类:碳水化合物代谢、有机酸代谢、含氮化合物代谢和细胞运输。还发现了一些新的功能,包括杂环代谢和对刺激的反应。基于同源 CRP 的功能相似性,我们将该模型应用于 35 个其他物种。预测工具和预测结果可在线获得:https://awi.cuhk.edu.cn/∼CRPBSFinder。