Taboada-Castro Hermenegildo, Hernández-Álvarez Alfredo José, Castro-Mondragón Jaime A, Encarnación-Guevara Sergio
Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.
Centre for Molecular Medicine Norway, Nordic EMBL Partnership, University of Oslo, Oslo, Norway.
Bioinform Biol Insights. 2024 Sep 6;18:11779322241272395. doi: 10.1177/11779322241272395. eCollection 2024.
RhizoBindingSites is a depurified database of conserved DNA motifs potentially involved in the transcriptional regulation of the , , , , and genera covering 9 representative symbiotic species, deduced from the upstream regulatory sequences of orthologous genes (O-matrices) from the Rhizobiales taxon. The sites collected with O-matrices per gene per genome from RhizoBindingSites were used to deduce matrices using the dyad-Regulatory Sequence Analysis Tool (RSAT) method, giving rise to novel S-matrices for the construction of the RizoBindingSites v2.0 database. A comparison of the S-matrix logos showed a greater frequency and/or re-definition of specific-position nucleotides found in the O-matrices. Moreover, S-matrices were better at detecting genes in the genome, and there was a more significant number of transcription factors (TFs) in the vicinity than O-matrices, corresponding to a more significant genomic coverage for S-matrices. O-matrices of 3187 TFs and S-matrices of 2754 TFs from 9 species were deposited in RhizoBindingSites and RhizoBindingSites v2.0, respectively. The homology between the matrices of TFs from a genome showed inter-regulation between the clustered TFs. In addition, matrices of AraC, ArsR, GntR, and LysR ortholog TFs showed different motifs, suggesting distinct regulation. Benchmarking showed 72%, 68%, and 81% of common genes per regulon for O-matrices and approximately 14% less common genes with S-matrices of CFN42, bv. 3841, and 1021. These data were deposited in RhizoBindingSites and the RhizoBindingSites v2.0 database (http://rhizobindingsites.ccg.unam.mx/).
根瘤菌结合位点是一个经过净化的数据库,包含可能参与根瘤菌属、慢生根瘤菌属、中华根瘤菌属、中慢生根瘤菌属和土壤杆菌属转录调控的保守DNA基序,涵盖9个代表性共生物种,这些基序是从根瘤菌目分类群的直系同源基因(O矩阵)的上游调控序列推导而来的。从根瘤菌结合位点数据库中每个基因组每个基因的O矩阵收集的位点,使用二元调控序列分析工具(RSAT)方法推导矩阵,从而产生用于构建根瘤菌结合位点v2.0数据库的新型S矩阵。S矩阵标志的比较显示,在O矩阵中发现的特定位置核苷酸的频率更高和/或有重新定义。此外,S矩阵在检测基因组中的基因方面表现更好,并且其附近的转录因子(TF)数量比O矩阵更多,这对应于S矩阵更显著的基因组覆盖范围。来自9个物种的3187个TF的O矩阵和2754个TF的S矩阵分别存放在根瘤菌结合位点数据库和根瘤菌结合位点v2.0数据库中。一个基因组中TF矩阵之间的同源性表明成簇TF之间存在相互调控。此外,AraC、ArsR、GntR和LysR直系同源TF的矩阵显示出不同的基序,表明调控方式不同。基准测试表明,O矩阵每个调控子的共同基因分别为72%、68%和81%,而CFN42、bv. 3841和1021的S矩阵的共同基因大约少14%。这些数据存放在根瘤菌结合位点数据库和根瘤菌结合位点v2.0数据库(http://rhizobindingsites.ccg.unam.mx/)中。