Bechtel Jason M, Wittenschlaeger Thomas, Dwyer Trisha, Song Jun, Arunachalam Sasi, Ramakrishnan Sadeesh K, Shepard Samuel, Fedorov Alexei
Program in Bioinformatics and Proteomics/Genomics, University of Toledo Health Science Campus, Toledo, OH 43614, USA.
BMC Genomics. 2008 Jun 12;9:284. doi: 10.1186/1471-2164-9-284.
Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression.
We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena.
We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20-1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.
基因组具有不同程度的非随机性,特别是其核苷酸组成存在不均匀性。这种不均匀性从短程范围(相邻核苷酸影响某一位置碱基的选择)到长程范围(通常称为等密度区,特定碱基组成可跨越数百万个核苷酸)都有体现。另一个尚未得到充分阐明的基因组问题是RNA二级结构(SS)在基因表达中所起的作用。
我们展示了新的数据和方法,表明中等范围的不均匀性(约30至1000个核苷酸)不仅存在于哺乳动物基因组中,而且与强大的RNA二级结构显著相关。我们对一组11315条非冗余人类前体mRNA序列中的局部二级结构进行了全基因组生物信息学研究。由于这些分子的四个不同组成部分(5'非翻译区、外显子、内含子和3'非翻译区)在整体核苷酸组成、序列基序和周期性方面存在差异,因此对它们分别进行了考虑。对于每个前体mRNA组成部分,强大局部二级结构(<-25千卡/摩尔)的丰度比随机期望模型高出两到十倍。随机化过程保留了相应自然序列的短程不均匀性,从而消除了短程信号作为任何观察到的现象的可能贡献因素。
我们证明,前体mRNA中强大局部二级结构的过量与尚未充分探索的基因组中等范围不均匀性(MRI)现象有关。MRI是指在20至1000个核苷酸的距离上核苷酸选择与碱基组成之间的相互依存关系。此外,我们创建了一个公共计算资源,以支持对基因组MRI的进一步研究。