Suppr超能文献

大型EST文库的自动聚类与组装

Automated clustering and assembly of large EST collections.

作者信息

Yee D P, Conklin D

机构信息

ZymoGenetics, Inc., Seattle, WA 98102, USA. yee,

出版信息

Proc Int Conf Intell Syst Mol Biol. 1998;6:203-11.

PMID:9783226
Abstract

The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recursive EST eXtender) algorithm described in this paper completely automates this process by finding ESTs that can be clustered on the basis of overlapping bases, and then assembling the contigs into a consensus sequence. By combining the clustering and assembly steps, REX can quickly generate assemblies from EST databases that are frequently updated without having to preprocess the data. A consensus assembly method is used to correct miscalled bases and remove indel errors. A unique feature of this method is that it addresses the issues of splice variants and unspliced cDNA data. Since REX is a fast greedy algorithm, it can address the problem of generating a database of assembled sequences from very large collections of EST data. A procedure is described for creating and maintaining an Assembled Consensus EST database (ACE) that is useful for characterizing the large body of data that exists in EST databases.

摘要

大型EST(表达序列标签)数据库的出现引发了新基因克隆方式的一场革命。然而,由于原始EST数据的错误率高和冗余性,困难也随之而来。出于这些原因,研究任何感兴趣的EST的科学家首先要执行的任务之一就是收集连续的EST,并将它们组装成一个更大的虚拟cDNA。本文描述的REX(递归EST扩展器)算法通过找到可以基于重叠碱基进行聚类的EST,然后将重叠群组装成一个共有序列,从而完全自动化了这个过程。通过结合聚类和组装步骤,REX可以从频繁更新的EST数据库中快速生成组装结果,而无需对数据进行预处理。一种共有组装方法用于校正错误调用的碱基并消除插入缺失错误。该方法的一个独特之处在于它解决了剪接变体和未剪接cDNA数据的问题。由于REX是一种快速贪婪算法,它可以解决从非常大量的EST数据集合中生成组装序列数据库的问题。本文描述了一种用于创建和维护组装共有EST数据库(ACE)的程序,该数据库对于表征EST数据库中存在的大量数据很有用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验