Rehm B H
Institut für Mikrobiologie der Westfalischen Wilhelms-Universität Münster, Germany.
Appl Microbiol Biotechnol. 2001 Dec;57(5-6):579-92. doi: 10.1007/s00253-001-0844-0.
The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.
高效DNA测序方法的发展已使得(截至目前)从55种原核生物、5种真核生物和10条真核染色体中获取了整个基因组的DNA序列。因此,现在已有大量的DNA序列数据,而且在不久的将来还会有更多数据出现。分析如此海量的数据需要生物信息学工具,以便识别编码功能性蛋白质或RNA的基因。鉴于即使在研究充分的大肠杆菌中,超过30%已识别的开放阅读框都是假设基因,这是一项重要任务。基因组序列分析未来面临的挑战将包括对基因调控的理解以及代谢途径重建,其中包括DNA芯片技术,该技术在生物医学和有价值化合物的生物技术生产方面具有巨大潜力。海量的信息常常让科学家们感到困惑。本综述旨在提供一份指南,介绍如何通过应用当前公开可用的数据库和网络服务,选择最有效的方法来分析新序列或收集有关感兴趣的基因或蛋白质的信息。将讨论最近开发的主要基于推导氨基酸序列的序列相似性、利用当前可用且不断增加的生物数据库对基因进行功能分配的工具。