超越基于序列比对的比较基因组学：ENCODE区域中的RNA结构

Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions.

作者信息

Torarinsson Elfar, Yao Zizhen, Wiklund Eric D, Bramsen Jesper B, Hansen Claus, Kjems Jørgen, Tommerup Niels, Ruzzo Walter L, Gorodkin Jan

机构信息

Section for Genetics and Bioinformatics, IBVH, Faculty of Life Sciences, University of Copenhagen, 1870 Frederiksberg C, Denmark.

出版信息

Genome Res. 2008 Feb;18(2):242-51. doi: 10.1101/gr.6887408. Epub 2007 Dec 20.

DOI:10.1101/gr.6887408

PMID:18096747

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2203622/

Abstract

Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in >50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches--84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.

摘要

最近对多种生物中的非编码RNA（ncRNA）进行的计算扫描依赖于现有的多序列比对。然而，随着序列相似性的降低，RNA结构的一个关键信号——频繁的补偿性碱基变化——越来越有可能导致基于序列的比对方法出现比对错误，甚至拒绝比对同源ncRNA，从而掩盖了该结构信号。我们使用了CMfinder（一种面向结构的局部比对工具）来搜索脊椎动物多序列比对的ENCODE区域。与其他研究一致，我们在ENCODE区域发现了大量潜在的RNA结构。我们报告了6587个候选区域，估计假阳性率为50%。更有趣的是，与仅基于一级序列的比对相比，考虑RNA二级结构的比对可能能更好地代表其中许多候选区域，而且往往差异显著。例如，我们预测的基序中约四分之一在其比对位置上有超过50%的修正。此外，我们的结果与基于序列比对的方法所发现的结果具有很强的互补性——我们的候选区域中有84%未被瓦西特尔等人涵盖，这使得ENCODE区域中ncRNA候选区域的数量增加了32%。在一组通过逆转录聚合酶链反应（RT-PCR）测试的11个ncRNA候选区域中，有10个被证实以RNA转录本的形式存在于人体组织中，并且大多数显示出在不同组织中存在显著差异表达的证据。我们的结果广泛表明，在不太保守的区域进行任何依赖多序列比对的分析时都应谨慎，明确支持对ncRNA生物学意义的日益重视，并有力支持在搜索这些元件时直接考虑RNA结构的观点。

相似文献

Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions.超越基于序列比对的比较基因组学：ENCODE区域中的RNA结构

Genome Res. 2008 Feb;18(2):242-51. doi: 10.1101/gr.6887408. Epub 2007 Dec 20.

Detection of RNA structures in porcine EST data and related mammals.在猪的EST数据及相关哺乳动物中检测RNA结构

BMC Genomics. 2007 Sep 10;8:316. doi: 10.1186/1471-2164-8-316.

Identification and characterization of novel conserved RNA structures in Drosophila.鉴定和描述果蝇中新型保守 RNA 结构。

BMC Genomics. 2018 Dec 11;19(1):899. doi: 10.1186/s12864-018-5234-4.

Biocomputational prediction of non-coding RNAs in model cyanobacteria.模式蓝藻中非编码RNA的生物计算预测

BMC Genomics. 2009 Mar 23;10:123. doi: 10.1186/1471-2164-10-123.

Chain-RNA: a comparative ncRNA search tool based on the two-dimensional chain algorithm.链 RNA：一种基于二维链算法的比较 ncRNA 搜索工具。

IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):274-85. doi: 10.1109/TCBB.2012.137.

A local multiple alignment method for detection of non-coding RNA sequences.一种用于检测非编码RNA序列的局部多重比对方法。

Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.

Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics.通过比较基因组学对酿酒酵母中非编码RNA进行计算鉴定。

Nucleic Acids Res. 2003 Jul 15;31(14):4119-28. doi: 10.1093/nar/gkg438.

Structure-based whole-genome realignment reveals many novel noncoding RNAs.基于结构的全基因组重排揭示了许多新的非编码 RNA。

Genome Res. 2013 Jun;23(6):1018-27. doi: 10.1101/gr.137091.111. Epub 2013 Jan 7.

A comparative genome-wide study of ncRNAs in trypanosomatids.原核生物与真核生物中 ncRNA 的比较基因组研究

BMC Genomics. 2010 Nov 4;11:615. doi: 10.1186/1471-2164-11-615.

Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures.基于保守且稳定的二级结构在多个基因组比对中发现新型非编码RNA序列

PLoS One. 2015 Jun 15;10(6):e0130200. doi: 10.1371/journal.pone.0130200. eCollection 2015.

引用本文的文献

SMDesigner: a program to design sequence mutations to assess RNA structure.SMDesigner：一个用于设计序列突变以评估RNA结构的程序。

RNA. 2025 Jun 16;31(7):874-884. doi: 10.1261/rna.080267.124.

DecoyFinder: Identification of Contaminants in Sets of Homologous RNA Sequences.诱饵序列查找器：同源RNA序列集中污染物的鉴定

bioRxiv. 2024 Oct 15:2024.10.12.618037. doi: 10.1101/2024.10.12.618037.

Comparative RNA Genomics.比较 RNA 基因组学。

Methods Mol Biol. 2024;2802:347-393. doi: 10.1007/978-1-0716-3838-5_12.

Does rapid sequence divergence preclude RNA structure conservation in vertebrates?快速序列分歧是否排除脊椎动物中 RNA 结构的保守性？

Nucleic Acids Res. 2022 Mar 21;50(5):2452-2463. doi: 10.1093/nar/gkac067.

Functional and structural basis of extreme conservation in vertebrate 5' untranslated regions.脊椎动物 5' 非翻译区极端保守性的功能和结构基础。

Nat Genet. 2021 May;53(5):729-741. doi: 10.1038/s41588-021-00830-1. Epub 2021 Apr 5.

Evolutionary conservation of RNA sequence and structure.RNA 序列和结构的进化保守性。

Wiley Interdiscip Rev RNA. 2021 Sep;12(5):e1649. doi: 10.1002/wrna.1649. Epub 2021 Mar 22.

Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics.通过比较基因组学鉴定人类中的11个候选结构化非编码RNA基序。

BMC Genomics. 2021 Mar 9;22(1):164. doi: 10.1186/s12864-021-07474-9.

GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering.GraphClust2：具有可扩展和可访问的集成聚类功能的结构化 RNA 的注释和发现。

Gigascience. 2019 Dec 1;8(12). doi: 10.1093/gigascience/giz150.

SSS-test: a novel test for detecting positive selection on RNA secondary structure.SSS-测试：一种用于检测 RNA 二级结构中正向选择的新测试。

BMC Bioinformatics. 2019 Mar 21;20(1):151. doi: 10.1186/s12859-019-2711-y.

Evolutionary Patterns of Non-Coding RNA in Cardiovascular Biology.心血管生物学中非编码RNA的进化模式

Noncoding RNA. 2019 Jan 31;5(1):15. doi: 10.3390/ncrna5010015.

本文引用的文献

How accurately is ncRNA aligned within whole-genome multiple alignments?非编码RNA（ncRNA）在全基因组多重比对中的比对准确性如何？

BMC Bioinformatics. 2007 Oct 26;8:417. doi: 10.1186/1471-2105-8-417.

Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix.通过修剪动态规划矩阵实现快速成对结构RNA比对。

PLoS Comput Biol. 2007 Oct;3(10):1896-908. doi: 10.1371/journal.pcbi.0030193. Epub 2007 Aug 20.

Detection of RNA structures in porcine EST data and related mammals.在猪的EST数据及相关哺乳动物中检测RNA结构

BMC Genomics. 2007 Sep 10;8:316. doi: 10.1186/1471-2164-8-316.

The mRNA-like noncoding RNA Gomafu constitutes a novel nuclear domain in a subset of neurons.类信使核糖核酸的非编码核糖核酸Gomafu在一部分神经元中构成了一个新的核结构域。

J Cell Sci. 2007 Aug 1;120(Pt 15):2498-506. doi: 10.1242/jcs.009357. Epub 2007 Jul 10.

Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline.使用CMfinder比较基因组学流程鉴定细菌中的22个候选结构化RNA

Nucleic Acids Res. 2007;35(14):4809-19. doi: 10.1093/nar/gkm487. Epub 2007 Jul 9.

A computational pipeline for high- throughput discovery of cis-regulatory noncoding RNA in prokaryotes.一种用于高通量发现原核生物顺式调控非编码RNA的计算流程。

PLoS Comput Biol. 2007 Jul;3(7):e126. doi: 10.1371/journal.pcbi.0030126.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.ENCODE试点项目对人类基因组1%的功能元件进行鉴定与分析。

Nature. 2007 Jun 14;447(7146):799-816. doi: 10.1038/nature05874.

Structured RNAs in the ENCODE selected regions of the human genome.人类基因组ENCODE选定区域中的结构化RNA

Genome Res. 2007 Jun;17(6):852-64. doi: 10.1101/gr.5650707.

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.对1%人类基因组的深度哺乳动物序列比对和约束预测分析。

Genome Res. 2007 Jun;17(6):760-74. doi: 10.1101/gr.6034307.

RNA maps reveal new RNA classes and a possible function for pervasive transcription.RNA图谱揭示了新的RNA类别以及广泛转录的一种可能功能。

Science. 2007 Jun 8;316(5830):1484-8. doi: 10.1126/science.1138341. Epub 2007 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验