Suppr超能文献

SnoReport:未知靶标的小分子核仁RNA的计算识别

SnoReport: computational identification of snoRNAs with unknown targets.

作者信息

Hertel Jana, Hofacker Ivo L, Stadler Peter F

机构信息

Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria.

出版信息

Bioinformatics. 2008 Jan 15;24(2):158-64. doi: 10.1093/bioinformatics/btm464. Epub 2007 Sep 25.

Abstract

UNLABELLED

Unlike tRNAs and microRNAs, both classes of snoRNAs, which direct two distinct types of chemical modifications of uracil residues, have proved to be surprisingly difficult to find in genomic sequences. Most computational approaches so far have explicitly used the fact that snoRNAs predominantly target ribosomal RNAs and spliceosomal RNAs. The target is specified by a short stretch of sequence complementarity between the snoRNA and its target. This sequence complementarity to known targets crucially contributes to sensitivity and specificity of snoRNA gene finding algorithms. The discovery of 'orphan' snoRNAs, which either have no known target, or which target ordinary protein-coding mRNAs, however, begs the question whether this class of 'housekeeping' non-coding RNAs is much more widespread and might have a diverse set of regulatory functions. In order to approach this question, we present here a combination of RNA secondary structure prediction and machine learning that is designed to recognize the two major classes of snoRNAs, box C/D and box H/ACA snoRNAs, among ncRNA candidate sequences. The snoReport approach deliberately avoids any usage of target information. We find that the combination of the conserved sequence boxes and secondary structure constraints as a pre-filter with SVM classifiers based on a small set of structural descriptors are sufficient for a reliable identification of snoRNAs. Tests of snoReport on data from several recent experimental surveys show that the approach is feasible; the application to a dataset from a large-scale comparative genomics survey for ncRNAs suggests that there are likely hundreds of previously undescribed 'orphan' snoRNAs still hidden in the human genome.

AVAILABILITY

The snoReport software is implemented in ANSI C. The source code is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/snoReport.

摘要

未标注

与转运RNA(tRNA)和微小RNA(miRNA)不同,两类核仁小分子RNA(snoRNA)虽能引导尿嘧啶残基的两种不同化学修饰类型,但在基因组序列中却极难找到。迄今为止,大多数计算方法都明确利用了snoRNA主要靶向核糖体RNA和剪接体RNA这一事实。snoRNA与其靶标的短序列互补性决定了靶标。这种与已知靶标的序列互补性对snoRNA基因发现算法的敏感性和特异性至关重要。然而,“孤儿”snoRNA的发现,即那些要么没有已知靶标,要么靶向普通蛋白质编码信使RNA的snoRNA,引发了一个问题,即这类“管家”非编码RNA是否更为广泛存在,是否可能具有多种调控功能。为了解决这个问题,我们在此提出一种结合RNA二级结构预测和机器学习的方法,旨在从非编码RNA候选序列中识别两类主要的snoRNA,即C/D盒和H/ACA盒snoRNA。snoReport方法刻意避免使用任何靶标信息。我们发现,保守序列盒和二级结构限制相结合作为预筛选,再加上基于一小套结构描述符的支持向量机(SVM)分类器,足以可靠地识别snoRNA。对近期几项实验调查数据进行的snoReport测试表明该方法可行;将其应用于大规模非编码RNA比较基因组学调查数据集表明,人类基因组中可能仍隐藏着数百种此前未描述的“孤儿”snoRNA。

可用性

snoReport软件用ANSI C实现。源代码可在GNU公共许可证下从http://www.bioinf.uni-leipzig.de/Software/snoReport获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验