Suppr超能文献

使用决策树和部分协方差模型进行RNA搜索。

RNA search with decision trees and partial covariance models.

作者信息

Smith Jennifer A

机构信息

Electrical and Computer Engineering Department, Boise State University, 1910 University Ave., Boise, ID 83725-2075, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):517-27. doi: 10.1109/TCBB.2008.120.

Abstract

The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone.

摘要

本文探讨了使用部分协方差模型在基因组序列数据库中搜索RNA家族成员的方法。部分模型由整个RNA家族多序列比对列的连续子范围构成。提出了一个二元决策树框架,用于选择应用部分模型的顺序以及做出决策时所依据的得分阈值。选择决策树的目的是在所有训练序列都传递给完整协方差模型进行最终评估的约束条件下,使计算时间最短。由于决策树可能相当复杂且在这些情况下没有明显的构建方法,因此建议使用计算智能方法来选择决策树。来自七个RNA家族的实验结果表明,相对于单独使用完整协方差模型,执行时间为0.066 - 0.268。对每个家族的已知序列全集进行测试表明,其中两个家族至少发现了95%的序列,另外五个家族则发现了100%的序列。由于完整协方差模型会对部分模型决策树接受的所有序列运行,因此误报率至少与单独使用完整模型时一样低。

相似文献

1
RNA search with decision trees and partial covariance models.使用决策树和部分协方差模型进行RNA搜索。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):517-27. doi: 10.1109/TCBB.2008.120.
2
Pair hidden Markov models on tree structures.树结构上的成对隐马尔可夫模型。
Bioinformatics. 2003;19 Suppl 1:i232-40. doi: 10.1093/bioinformatics/btg1032.
4
Alignments of RNA structures.RNA 结构的比对。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):309-22. doi: 10.1109/TCBB.2008.28.
10
A memory efficient method for structure-based RNA multiple alignment.基于结构的 RNA 多重比对的一种内存高效方法。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):1-11. doi: 10.1109/TCBB.2011.86. Epub 2011 Apr 29.

引用本文的文献

1
A Machine Learning Approach for Accurate Annotation of Noncoding RNAs.一种用于非编码RNA精确注释的机器学习方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):551-9. doi: 10.1109/TCBB.2014.2366758.
2
Efficient known ncRNA search including pseudoknots.高效已知 ncRNA 搜索包括假结。
BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S25. doi: 10.1186/1471-2105-14-S2-S25. Epub 2013 Jan 21.

本文引用的文献

1
RNAmmer: consistent and rapid annotation of ribosomal RNA genes.RNAmmer:核糖体RNA基因的一致性快速注释
Nucleic Acids Res. 2007;35(9):3100-8. doi: 10.1093/nar/gkm160. Epub 2007 Apr 22.
2
Query-dependent banding (QDB) for faster RNA similarity searches.用于更快RNA相似性搜索的查询依赖条带法(QDB)。
PLoS Comput Biol. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056. Epub 2007 Feb 7.
5
Pseudoknots: RNA structures with diverse functions.假结:具有多种功能的RNA结构
PLoS Biol. 2005 Jun;3(6):e213. doi: 10.1371/journal.pbio.0030213. Epub 2005 Jun 14.
9
Rfam: an RNA family database.Rfam:一个RNA家族数据库。
Nucleic Acids Res. 2003 Jan 1;31(1):439-41. doi: 10.1093/nar/gkg006.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验