Tseng Huei-Hun, Weinberg Zasha, Gore Jeremy, Breaker Ronald R, Ruzzo Walter L
Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195-2350, USA.
J Bioinform Comput Biol. 2009 Apr;7(2):373-88. doi: 10.1142/s0219720009004126.
Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary sequence and secondary structure. We evaluate our approach using a set of predominantly Firmicutes sequences. Our results showed that, though primary sequence based-homology search was inaccurate for diverged ncRNA sequences, through our clustering method, we were able to infer motifs that recovered nearly all members of most known ncRNA families. Hence, our method shows promise for discovering new families of ncRNA.
非编码RNA(ncRNAs)是不编码蛋白质的转录本。最近的研究结果表明,RNA介导的调控机制影响了相当一部分典型的微生物基因组。我们提出了一种通过基于从一级序列和二级结构推断出的同源性对基因组序列进行聚类来寻找细菌中潜在ncRNAs的有效方法。我们使用一组主要为厚壁菌门的序列来评估我们的方法。我们的结果表明,尽管基于一级序列的同源性搜索对于分化的ncRNA序列不准确,但通过我们的聚类方法,我们能够推断出几乎涵盖了大多数已知ncRNA家族所有成员的基序。因此,我们的方法在发现新的ncRNA家族方面显示出了前景。