Suppr超能文献

设计用于快速鉴定非编码RNA的二级结构图谱。

Designing secondary structure profiles for fast ncRNA identification.

作者信息

Sun Yanni, Buhler Jeremy

机构信息

Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.

出版信息

Comput Syst Bioinformatics Conf. 2008;7:145-56.

Abstract

Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that is unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect nearly all ncRNA instances while excluding most irrelevant sequences remains challenging. This work proposes a systematic procedure to convert a CM for an ncRNA family to a secondary structure profile (SSP), which augments a conservation profile with secondary structure information but can still be efficiently scanned against long sequences. We use dynamic programming to estimate an SSP's sensitivity and FP rate, yielding an efficient, fully automated filter design algorithm. Our experiments demonstrate that designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families, including those with and without strong sequence conservation. For highly structured ncRNA families, including secondary structure conservation yields better performance than using primary sequence conservation alone.

摘要

在基因组DNA中检测非编码RNA(ncRNA)是注释工作的重要组成部分。然而,用于对ncRNA家族进行建模的最广泛使用的工具——协方差模型(CM),在用于搜索时会产生高昂的计算成本。通过使用过滤器排除不太可能包含感兴趣的ncRNA的序列,仅在可能强烈匹配的地方应用CM,可以降低这种成本。尽管最近取得了进展,但设计一种能够检测几乎所有ncRNA实例同时排除大多数无关序列的高效过滤器仍然具有挑战性。这项工作提出了一种系统的程序,将ncRNA家族的CM转换为二级结构概况(SSP),它用二级结构信息增强了保守概况,但仍能有效地针对长序列进行扫描。我们使用动态规划来估计SSP的灵敏度和假阳性率,从而产生一种高效、全自动的过滤器设计算法。我们的实验表明,设计的SSP过滤器在保持对各种ncRNA家族(包括具有和不具有强序列保守性的家族)高灵敏度的同时,与未过滤的CM搜索相比,可以实现显著的加速。对于高度结构化的ncRNA家族,纳入二级结构保守性比仅使用一级序列保守性产生更好的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验