The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark.
Genome Res. 2011 Nov;21(11):1929-43. doi: 10.1101/gr.112516.110. Epub 2011 Oct 12.
Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.
调控 RNA 结构通常是基因组中多个直系同源实例的家族成员。家族成员具有功能和结构特性,这使它们可以作为一个整体进行研究,从而促进生物信息学和实验特性分析。我们开发了一种基于一级序列和二级结构相似性的比较方法 EvoFam,用于全基因组识别调控 RNA 结构家族。我们将 EvoFam 应用于 41 种基因组脊椎动物比对。在全基因组范围内,我们鉴定了 220 个人类、高可信度的非蛋白编码区调控 RNA 结构家族,包含 725 个单独的结构,其中包括 48 个具有已知结构 RNA 元件的家族。鉴定出的已知家族包括非编码 RNA,如 miRNA 和最近发现的 MALAT1/MEN β lincRNA 家族;以及顺式调控结构,如铁反应元件。我们还鉴定了数十个具有强大进化证据和其他统计证据(如 GO 术语富集)支持的新家族。对于其中一些家族,详细分析导致了特定功能假设的提出。例如,两个假设的自动调节反馈机制:一个涉及 MAT2A 3'-UTR 中的六个长发夹,MAT2A 是产生主要的人类甲基供体 S-腺苷甲硫氨酸的关键代谢基因;另一个涉及 tRNA 成熟基因 POP1 内含子中的 tRNA 样结构。我们实验验证了预测的 MAT2A 结构。最后,我们确定了潜在的新调控网络,包括富含免疫相关基因的短发夹大家族,如 TNF、FOS 和 CTLA4,其中包括已知的转录物不稳定元件。我们的发现体现了转录后调控的多样性,并为进一步研究新的调控机制和非编码 RNA 家族提供了资源。