Ronin Institute, 127 Haddon Pl, Montclair, NJ 07043-2314, USA.
Genes (Basel). 2022 Oct 28;13(11):1970. doi: 10.3390/genes13111970.
This study seeks to investigate distinct signatures and codes within different genomic sequence locations of the human genome. The promoter and other non-coding regions contain sites for the binding of biological particles, for processes such as transcription regulation. The specific rules and sequence codes that govern this remain poorly understood. To derive these (codes), the general designs of sequence are investigated. Genomic signatures are a powerful tool for assessing the general designs of sequence, and cross-comparing different genomic regions for their distinct sequence properties. Through these genomic signatures, the relative non-random properties of sequences are also assessed. Furthermore, a binary components analysis is carried out making use of information theory ideas, to study the RY (purine/pyrimidine), WS (weak/strong) and KM (keto/amino) signatures in the sequences. From this comparison, it is possible to identify the relative importance of these properties within the various protein-coding and non-coding genomic locations. The results show that coding DNA has a strongly non-random WS signature, which reflects the genetic code, and the hydrogen-bond base pairing of codon-anti-codon interactions. In contrast, non-coding locations, such as the promoter, contain a distinct genomic signature. A prominent feature throughout non-coding DNA is a highly non-random RY signature, which is very different in nature to coding DNA, and suggests a structural-based RY code. This marks progress towards deciphering the unknown code(s) in non-protein-coding DNA, and a further understanding of the coding DNA. Additionally, it unravels how DNA carries information. These findings have implications for the most fundamental principles of biology, including knowledge of gene regulation, development and disease.
本研究旨在探究人类基因组不同基因组序列位置中的独特特征和编码。启动子和其他非编码区域包含生物粒子结合的位点,用于转录调控等过程。这些过程的具体规则和序列编码仍知之甚少。为了得出这些(编码),研究了序列的一般设计。基因组特征是评估序列一般设计的有力工具,并对不同基因组区域进行比较,以研究其独特的序列特性。通过这些基因组特征,还评估了序列的相对非随机特性。此外,还利用信息论思想进行了二进制分量分析,以研究序列中的 RY(嘌呤/嘧啶)、WS(弱/强)和 KM(酮/氨基)特征。通过这种比较,可以确定这些特性在各种蛋白质编码和非编码基因组位置中的相对重要性。结果表明,编码 DNA 具有强烈的非随机 WS 特征,反映了遗传密码和密码子-反密码子相互作用的氢键碱基配对。相比之下,非编码位置,如启动子,包含独特的基因组特征。非编码 DNA 的一个突出特征是高度非随机的 RY 特征,其性质与编码 DNA 非常不同,这表明存在基于结构的 RY 编码。这标志着在非蛋白编码 DNA 中解码未知编码方面取得了进展,并进一步了解了编码 DNA。此外,它揭示了 DNA 如何携带信息。这些发现对生物学的最基本原则具有重要意义,包括对基因调控、发育和疾病的认识。