Zhang Shaojie, Haas Brian, Eskin Eleazar, Bafna Vineet
Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0114, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):366-79. doi: 10.1109/TCBB.2005.57.
The discovery of novel noncoding RNAs has been among the most exciting recent developments in biology. It has been hypothesized that there is, in fact, an abundance of functional noncoding RNAs (ncRNAs) with various catalytic and regulatory functions. However, the inherent signal for ncRNA is weaker than the signal for protein coding genes, making these harder to identify. We consider the following problem: Given an RNA sequence with a known secondary structure, efficiently detect all structural homologs in a genomic database by computing the sequence and structure similarity to the query. Our approach, based on structural filters that eliminate a large portion of the database while retaining the true homologs, allows us to search a typical bacterial genome in minutes on a standard PC. The results are two orders of magnitude better than the currently available software for the problem. We applied FastR to the discovery of novel riboswitches, which are a class of RNA domains found in the untranslated regions. They are of interest because they regulate metabolite synthesis by directly binding metabolites. We searched all available eubacterial and archaeal genomes for riboswitches from purine, lysine, thiamin, and riboflavin subfamilies. Our results point to a number of novel candidates for each of these subfamilies and include genomes that were not known to contain riboswitches.
新型非编码RNA的发现是近年来生物学领域最令人兴奋的进展之一。据推测,实际上存在大量具有各种催化和调节功能的功能性非编码RNA(ncRNA)。然而,ncRNA的内在信号比蛋白质编码基因的信号弱,这使得它们更难被识别。我们考虑以下问题:给定一个具有已知二级结构的RNA序列,通过计算与查询序列的序列和结构相似性,在基因组数据库中高效检测所有结构同源物。我们的方法基于结构过滤器,该过滤器在保留真正同源物的同时消除数据库的大部分内容,使我们能够在标准个人计算机上几分钟内搜索一个典型的细菌基因组。结果比目前解决该问题的可用软件好两个数量级。我们将FastR应用于新型核糖开关的发现,核糖开关是在非翻译区发现的一类RNA结构域。它们之所以受到关注,是因为它们通过直接结合代谢物来调节代谢物的合成。我们在所有可用的真细菌和古细菌基因组中搜索嘌呤、赖氨酸、硫胺素和核黄素亚家族的核糖开关。我们的结果指出了这些亚家族中每个亚家族的一些新候选物,并包括了以前不知道含有核糖开关的基因组。