Suppr超能文献

基因组中的基序分布有助于深入了解基因聚类和共调控。

Motif distribution in genomes gives insights into gene clustering and co-regulation.

作者信息

Chakraborty Atreyi, Chopde Sumant, Madhusudhan Mallur Srivatsan

机构信息

Department of Biology, Indian Institute of Science Education and Research, Dr Homi Bhabha Rd, Pashan, Pune, Maharashtra 411008, India.

Department of Data Science, Indian Institute of Science Education and Research, Dr Homi Bhabha Rd, Pashan, Pune, Maharashtra 411008, India.

出版信息

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1178.

Abstract

We read the genome as proteins in the cell would - by studying the distributions of 5-6 base motifs of DNA in the whole genome or smaller stretches such as parts of, or whole chromosomes. This led us to some interesting findings about motif clustering and chromosome organization. It is quite clear that the motif distribution in genomes is not random at the length scales we examined: 1 kb to entire chromosomes. The observed-to-expected (OE) ratios of motif distributions show strong correlations in pairs of chromosomes that are susceptible to translocations. With the aid of examples, we suggest that similarity in motif distributions in promoter regions of genes could imply co-regulation. A simple extension of this idea empowers us with the ability to construct gene regulatory networks. Further, we could make inferences about the spatial proximity of genomic fragments using these motif distributions. Spatially proximal regions, as deduced by Hi-C or pcHi-C, were ∼3.5 times more likely to have their motif distributions correlated than non-proximal regions. These correlations had strong contributions from the CTCF protein recognizing motifs which are known markers of topologically associated domains. In general, correlating genomic regions by motif distribution comparisons alone is rife with functional information.

摘要

我们像细胞中的蛋白质那样读取基因组——通过研究整个基因组或较小片段(如部分或整条染色体)中DNA的5至6个碱基基序的分布情况。这使我们在基序聚类和染色体组织方面有了一些有趣的发现。很明显,在我们所研究的长度尺度(从1千碱基对到整条染色体)上,基因组中的基序分布并非随机。基序分布的观测与预期(OE)比率在易发生易位的染色体对中显示出很强的相关性。通过举例,我们认为基因启动子区域基序分布的相似性可能意味着共同调控。这个想法的简单延伸使我们能够构建基因调控网络。此外,利用这些基序分布,我们可以推断基因组片段的空间接近性。通过Hi-C或pcHi-C推断出的空间接近区域,其基序分布相关的可能性是非接近区域的约3.5倍。这些相关性在很大程度上源于识别基序的CTCF蛋白,而这些基序是拓扑相关结构域的已知标记。一般来说,仅通过基序分布比较来关联基因组区域就充满了功能信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5eb/11724300/62ce43d94d33/gkae1178figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验