Suppr超能文献

Tracembler——用于未组装基因组中电子染色体步移的软件。

Tracembler--software for in-silico chromosome walking in unassembled genomes.

作者信息

Dong Qunfeng, Wilkerson Matthew D, Brendel Volker

机构信息

Department of Genetics, Development & Cell Biology, Iowa State University, Ames, Iowa 50011, USA.

出版信息

BMC Bioinformatics. 2007 May 9;8:151. doi: 10.1186/1471-2105-8-151.

Abstract

BACKGROUND

Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user.

RESULTS

Tracembler takes one or multiple DNA or protein sequence(s) as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length Chrm2 gene as well as its upstream and downstream genomic regions from Rattus norvegicus reads. In a second example, a query with two adjacent Medicago truncatula genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus.

CONCLUSION

Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations.

摘要

背景

全基因组鸟枪法测序利用随机序列读取产生的基因组覆盖度越来越高。逐步进行全基因组组装并最终完成测序是一个通常需要数年时间才能完成大型真核生物基因组测序的过程。在此期间,公共测序项目的所有序列读取都存放在诸如NCBI Trace Archive等数据库中。对于特定基因座,早期的测序覆盖度可能足够高,从而能够产生可靠的局部基因组组装。我们开发了软件Tracembler,该软件通过从NCBI Trace Archive中递归组装选定物种的读取序列,从与用户提供的序列种子显著匹配的读取序列开始,促进电子染色体步移。

结果

Tracembler将一个或多个DNA或蛋白质序列作为输入提交给NCBI Trace Archive BLAST引擎,以识别来自感兴趣物种的匹配序列读取。BLAST搜索以递归方式进行,使得在前几轮搜索中识别出的BLAST匹配序列在后续轮次的BLAST搜索中用作新的查询序列。当不再发现新的匹配序列、给定的最大查询次数用尽或达到指定的最大递归轮次时,递归BLAST搜索停止。然后使用CAP3程序基于显著的序列重叠将所有BLAST匹配序列组装成重叠群。我们通过一个成功从褐家鼠读取序列中恢复全长Chrm2基因及其上游和下游基因组区域的例子,证明了该概念和软件实现的有效性。在第二个例子中,以两个相邻的蒺藜苜蓿基因作为种子进行查询,得到了一个重叠群,该重叠群可能识别出微共线性同源大豆基因座。

结论

Tracembler简化了递归数据库搜索、序列组装以及在所得重叠群中进行基因识别的过程,试图在有新的全基因组鸟枪法读取序列的物种中识别感兴趣基因的同源基因座。Tracembler的网络服务器位于http://www.plantgdb.org/tool/tracembler/,该软件也可从作者处免费获取用于本地安装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3159/1876249/5fab09e53c33/1471-2105-8-151-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验