RBR：用于ESTs的无文库重复序列检测

RBR: library-less repeat detection for ESTs.

作者信息

Malde Ketil, Schneeberger Korbinian, Coward Eivind, Jonassen Inge

机构信息

Computational Biology Unit, Bergen Centre for Computational Sciences, University of Bergen, Norway.

出版信息

Bioinformatics. 2006 Sep 15;22(18):2232-6. doi: 10.1093/bioinformatics/btl368. Epub 2006 Jul 12.

DOI:10.1093/bioinformatics/btl368

PMID:16837527

Abstract

MOTIVATION

Repeat sequences in ESTs are a source of problems, in particular for clustering. ESTs are therefore commonly masked against a library of known repeats. High quality repeat libraries are available for the widely studied organisms, but for most other organisms the lack of such libraries is likely to compromise the quality of EST analysis.

RESULTS

We present a fast, flexible and library-less method for masking repeats in EST sequences, based on match statistics within the EST collection. The method is not linked to a particular clustering algorithm. Extensive testing on datasets using different clustering methods and a genomic mapping as reference shows that this method gives results that are better than or as good as those obtained using RepeatMasker with a repeat library.

AVAILABILITY

The implementation of RBR is available under the terms of the GPL from http://www.ii.uib.no/~ketil/bioinformatics

CONTACT

ketil.malde@bccs.uib.no

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

EST（表达序列标签）中的重复序列是问题的一个来源，尤其是在聚类方面。因此，EST通常会针对已知重复序列库进行屏蔽。对于广泛研究的生物，有高质量的重复序列库可用，但对于大多数其他生物而言，缺乏此类库可能会影响EST分析的质量。

结果

我们提出了一种基于EST集合内匹配统计信息的快速、灵活且无需库的方法，用于屏蔽EST序列中的重复序列。该方法与特定的聚类算法无关。使用不同聚类方法并以基因组图谱作为参考对数据集进行的广泛测试表明，此方法给出的结果优于或等同于使用带有重复序列库的RepeatMasker所获得的结果。

可用性

RBR的实现可根据GPL条款从http://www.ii.uib.no/~ketil/bioinformatics获取。

联系方式

ketil.malde@bccs.uib.no

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

RBR: library-less repeat detection for ESTs.

Bioinformatics. 2006 Sep 15;22(18):2232-6. doi: 10.1093/bioinformatics/btl368. Epub 2006 Jul 12.

WindowMasker: window-based masker for sequenced genomes.

Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.

HomologMiner: looking for homologous genomic groups in whole genomes.

Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

Tandem repeats over the edit distance.

Bioinformatics. 2007 Jan 15;23(2):e30-5. doi: 10.1093/bioinformatics/btl309.

PALMA: mRNA to genome alignments using large margin algorithms.

Bioinformatics. 2007 Aug 1;23(15):1892-900. doi: 10.1093/bioinformatics/btm275. Epub 2007 May 30.

GARD: a genetic algorithm for recombination detection.

Bioinformatics. 2006 Dec 15;22(24):3096-8. doi: 10.1093/bioinformatics/btl474. Epub 2006 Nov 16.

Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans.

Bioinformatics. 2006 Mar 15;22(6):692-8. doi: 10.1093/bioinformatics/bti795. Epub 2005 Nov 24.

WebTraceMiner: a web service for processing and mining EST sequence trace files.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W137-42. doi: 10.1093/nar/gkm299. Epub 2007 May 8.

ESTuber db: an online database for Tuber borchii EST sequences.

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2105-8-S1-S13.

A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences.

Bioinformatics. 2004 Dec 12;20(18):3379-86. doi: 10.1093/bioinformatics/bth410. Epub 2004 Jul 15.

引用本文的文献

Transcriptome analysis of Corvus splendens reveals a repertoire of antimicrobial peptides.

Sci Rep. 2023 Oct 31;13(1):18728. doi: 10.1038/s41598-023-45875-w.

Towards decrypting cryptobiosis--analyzing anhydrobiosis in the tardigrade Milnesium tardigradum using transcriptome sequencing.

PLoS One. 2014 Mar 20;9(3):e92663. doi: 10.1371/journal.pone.0092663. eCollection 2014.

Filtering duplicate reads from 454 pyrosequencing data.

Bioinformatics. 2013 Apr 1;29(7):830-6. doi: 10.1093/bioinformatics/btt047. Epub 2013 Feb 1.

Maternal 3'UTRs: from egg to onset of zygotic transcription in Atlantic cod.

BMC Genomics. 2012 Sep 1;13:443. doi: 10.1186/1471-2164-13-443.

Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

PLoS One. 2010 May 19;5(5):e10720. doi: 10.1371/journal.pone.0010720.

k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage.

Bioinformatics. 2009 Sep 15;25(18):2302-8. doi: 10.1093/bioinformatics/btp410. Epub 2009 Jul 1.

Characterization of an Atlantic cod (Gadus morhua) embryonic stem cell cDNA library.

BMC Res Notes. 2009 May 6;2:74. doi: 10.1186/1756-0500-2-74.

Repeats and EST analysis for new organisms.

BMC Genomics. 2008 Jan 18;9:23. doi: 10.1186/1471-2164-9-23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

RBR：用于ESTs的无文库重复序列检测

RBR: library-less repeat detection for ESTs.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系方式

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献