Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium.
Department of Information Technology, IDLab, Ghent University-imec, Gent, Belgium.
DNA Res. 2022 Jun 25;29(4). doi: 10.1093/dnares/dsac029.
With the decreasing cost of sequencing and availability of larger numbers of sequenced genomes, comparative genomics is becoming increasingly attractive to complement experimental techniques for the task of transcription factor (TF) binding site identification. In this study, we redesigned BLSSpeller, a motif discovery algorithm, to cope with larger sequence datasets. BLSSpeller was used to identify novel motifs in Zea mays in a comparative genomics setting with 16 monocot lineages. We discovered 61 motifs of which 20 matched previously described motif models in Arabidopsis. In addition, novel, yet uncharacterized motifs were detected, several of which are supported by available sequence-based and/or functional data. Instances of the predicted motifs were enriched around transcription start sites and contained signatures of selection. Moreover, the enrichment of the predicted motif instances in open chromatin and TF binding sites indicates their functionality, supported by the fact that genes carrying instances of these motifs were often found to be co-expressed and/or enriched in similar GO functions. Overall, our study unveiled several novel candidate motifs that might help our understanding of the genotype to phenotype association in crops.
随着测序成本的降低和更多已测序基因组的可用性,比较基因组学对于补充实验技术来识别转录因子(TF)结合位点变得越来越有吸引力。在这项研究中,我们重新设计了 BLSSpeller,一种基序发现算法,以应对更大的序列数据集。BLSSpeller 被用于在包含 16 个单子叶植物谱系的比较基因组学环境中识别玉米中的新基序。我们发现了 61 个基序,其中 20 个与拟南芥中先前描述的基序模型匹配。此外,还检测到了新的、但尚未表征的基序,其中一些基序得到了可用的基于序列和/或功能数据的支持。预测基序的实例在转录起始位点附近富集,并包含选择的特征。此外,预测的基序实例在开放染色质和 TF 结合位点中的富集表明它们具有功能,这一事实得到了支持,即携带这些基序实例的基因通常被发现是共表达的,并且/或者在类似的 GO 功能中富集。总的来说,我们的研究揭示了几个新的候选基序,这可能有助于我们理解作物中基因型与表型的关联。