Suppr超能文献

一种分析“预完成”基因组序列数据的有效方法。

An effective approach for analyzing "prefinished" genomic sequence data.

作者信息

Kuehl P M, Weisemann J M, Touchman J W, Green E D, Boguski M S

机构信息

University of Maryland, Department of Molecular and Cellular Biology, Baltimore, Maryland 21201, USA.

出版信息

Genome Res. 1999 Feb;9(2):189-94.

Abstract

Ongoing efforts to sequence the human genome are already generating large amounts of data, with substantial increases anticipated over the next few years. In most cases, a shotgun sequencing strategy is being used, which rapidly yields most of the primary sequence in incompletely assembled sequence contigs ("prefinished" sequence) and more slowly produces the final, completely assembled sequence ("finished" sequence). Thus, in general, prefinished sequence is produced in excess of finished sequence, and this trend is certain to continue and even accelerate over the next few years. Even at a prefinished stage, genomic sequence represents a rich source of important biological information that is of great interest to many investigators. However, analyzing such data is a challenging and daunting task, both because of its sheer volume and because it can change on a day-by-day basis. To facilitate the discovery and characterization of genes and other important elements within prefinished sequence, we have developed an analytical strategy and system that uses readily available software tools in new combinations. Implementation of this strategy for the analysis of prefinished sequence data from human chromosome 7 has demonstrated that this is a convenient, inexpensive, and extensible solution to the problem of analyzing the large amounts of preliminary data being produced by large-scale sequencing efforts. Our approach is accessible to any investigator who wishes to assimilate additional information about particular sequence data en route to developing richer annotations of a finished sequence.

摘要

正在进行的人类基因组测序工作已经产生了大量数据,预计在未来几年还会大幅增加。在大多数情况下,采用的是鸟枪法测序策略,这种策略能迅速产生大部分存在于未完全组装的序列重叠群(“预完成”序列)中的初级序列,而生成最终的、完全组装好的序列(“完成”序列)则较为缓慢。因此,一般来说,预完成序列的产出量超过了完成序列,而且这种趋势在未来几年肯定会持续甚至加速。即使在预完成阶段,基因组序列也是重要生物信息的丰富来源,许多研究人员对此都很感兴趣。然而,分析这些数据是一项具有挑战性且艰巨的任务,这不仅是因为数据量巨大,还因为它可能每天都在变化。为了便于在预完成序列中发现和鉴定基因及其他重要元件,我们开发了一种分析策略和系统,该策略和系统将现成的软件工具以新的组合方式加以运用。对来自人类7号染色体的预完成序列数据实施这一分析策略表明,这是一种方便、廉价且可扩展的解决方案,能解决大规模测序工作所产生的大量初步数据的分析问题。任何希望在完善完成序列注释的过程中获取特定序列数据更多信息的研究人员都可以采用我们的方法。

相似文献

9
Optimal spliced alignments of short sequence reads.短序列 reads 的最优剪接比对。
Bioinformatics. 2008 Aug 15;24(16):i174-80. doi: 10.1093/bioinformatics/btn300.
10
WindowMasker: window-based masker for sequenced genomes.窗口掩码器:用于测序基因组的基于窗口的掩码器。
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.

本文引用的文献

2
Shotgun sequencing of the human genome.人类基因组的鸟枪法测序。
Science. 1998 Jun 5;280(5369):1540-2. doi: 10.1126/science.280.5369.1540.
7
Late-night thoughts on the sequence annotation problem.
Genome Res. 1998 Mar;8(3):168-9. doi: 10.1101/gr.8.3.168.
8
GenBank.基因银行
Nucleic Acids Res. 1998 Jan 1;26(1):1-7. doi: 10.1093/nar/26.1.1.
9
WebWise: navigating the Human Genome Project.
Genome Res. 1997 Nov;7(11):1038-9. doi: 10.1101/gr.7.11.1038.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验