Gadekar Veerendra P, Munk Alexander Welford, Miladi Milad, Junge Alexander, Backofen Rolf, Seemann Stefan E, Gorodkin Jan
Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark.
Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark.
NAR Genom Bioinform. 2024 Aug 9;6(3):lqae089. doi: 10.1093/nargab/lqae089. eCollection 2024 Sep.
RNA secondary structures play essential roles in the formation of the tertiary structure and function of a transcript. Recent genome-wide studies highlight significant potential for RNA structures in the mammalian genome. However, a major challenge is assigning functional roles to these structured RNAs. In this study, we conduct a guilt-by-association analysis of clusters of computationally predicted conserved RNA structure (CRSs) in human untranslated regions (UTRs) to associate them with gene functions. We filtered a broad pool of ∼500 000 human CRSs for UTR overlap, resulting in 4734 and 24 754 CRSs from the 5' and 3' UTR of protein-coding genes, respectively. We separately clustered these CRSs for both sets using RNAscClust, obtaining 793 and 2403 clusters, each containing an average of five CRSs per cluster. We identified overrepresented binding sites for 60 and 43 RNA-binding proteins co-localizing with the clustered CRSs. Furthermore, 104 and 441 clusters from the 5' and 3' UTRs, respectively, showed enrichment for various Gene Ontologies, including biological processes such as 'signal transduction', 'nervous system development', molecular functions like 'transferase activity' and the cellular components such as 'synapse' among others. Our study shows that significant functional insights can be gained by clustering RNA structures based on their structural characteristics.
RNA二级结构在转录本三级结构的形成和功能中起着至关重要的作用。最近的全基因组研究突出了哺乳动物基因组中RNA结构的巨大潜力。然而,一个主要挑战是为这些结构化RNA赋予功能角色。在本研究中,我们对人类非翻译区(UTR)中通过计算预测的保守RNA结构(CRS)簇进行了共关联分析,以将它们与基因功能联系起来。我们针对UTR重叠情况从约50万个广泛的人类CRS中进行筛选,分别从蛋白质编码基因的5'UTR和3'UTR中得到4734个和24754个CRS。我们使用RNAscClust对这两组CRS分别进行聚类,得到793个和2403个簇,每个簇平均包含五个CRS。我们确定了与聚类的CRS共定位的60种和43种RNA结合蛋白的过度富集结合位点。此外,分别来自5'UTR和3'UTR的104个和441个簇显示出各种基因本体的富集,包括“信号转导”“神经系统发育”等生物学过程、“转移酶活性”等分子功能以及“突触”等细胞成分。我们的研究表明,基于RNA结构的结构特征进行聚类可以获得重要的功能见解。