Suppr超能文献

DNA序列复杂性揭示了不同物种顺式调控区域中进化上保守的模式。

DNA Sequence Perplexity Reveals Evolutionarily Conserved Patterns in cis-Regulatory Regions Across Diverse Species.

作者信息

Gummadi Aruna Sesha Chandrika, Yella Venkata Rajesh

机构信息

Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India.

出版信息

Biochem Genet. 2025 Aug 21. doi: 10.1007/s10528-025-11231-y.

Abstract

Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.

摘要

解析基因组中的顺式调控区域对于理解各种生理过程和病理机制至关重要。调控特征,即启动子基序、转录因子结合位点、增强子、GC含量、CpG岛、DNA结构基序和其他顺式调控特征,因其在转录调控中的作用而广为人知。然而,这些特征往往表现出物种特异性变异,这对跨不同基因组识别保守的调控原则提出了挑战。在本研究中,我们引入DNA序列困惑度作为一种创新且高效的信息论指标来表征顺式调控区域。困惑度源自信息论和自然语言处理,它量化了序列的复杂性和可预测性,为DNA分析提供了一个独立于基序的框架。通过检查跨越不同分类群的1180个物种的转录和翻译起始位点区域,我们证明顺式调控区域与相邻侧翼区域相比始终表现出较低的困惑度。无论分类如何,这一趋势都持续存在,确立了困惑度作为调控DNA的一种进化保守模式。此外,我们在酵母数据集中观察到困惑度与启动子强度之间呈负相关,这表明较高的转录输出与序列困惑度的显著降低相关。我们的研究结果表明,困惑度可能为顺式调控DNA结构的可推广方面提供有价值的见解。将这种基于抽象的策略与基于基序的方法和高通量功能数据集相结合,可以提高其在比较基因组学和功能基因组学的预测应用中的适用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验