Ganesan Hamilton, Rakitianskaia Anna S, Davenport Colin F, Tümmler Burkhard, Reva Oleg N
Dep of Biochemistry, Bioinformatics and Computational Biology Unit, University of Pretoria, Lynnwood road, Hillcrest, Pretoria, 0002, South Africa.
BMC Bioinformatics. 2008 Aug 7;9:333. doi: 10.1186/1471-2105-9-333.
Data mining in large DNA sequences is a major challenge in microbial genomics and bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package.
The SeqWord Genome Browser (SWGB) was developed to visualize the natural compositional variation of DNA sequences. The applet is also used for identification of divergent genomic regions both in annotated sequences of bacterial chromosomes, plasmids, phages and viruses, and in raw DNA sequences prior to annotation by comparing local and global OU patterns. The applet allows fast and reliable identification of clusters of horizontally transferred genomic islands, large multi-domain genes and genes for ribosomal RNA. Within the majority of genomic fragments (also termed genomic core sequence), regions enriched with housekeeping genes, ribosomal proteins and the regions rich in pseudogenes or genetic vestiges may be contrasted.
The SWGB applet presents a range of comprehensive OU statistical parameters calculated for a range of bacterial species, plasmids and phages. It is available on the Internet at http://www.bi.up.ac.za/SeqWord/mhhapplet.php.
在大型DNA序列中进行数据挖掘是微生物基因组学和生物信息学中的一项重大挑战。寡核苷酸使用(OU)模式为大规模序列分析和可视化提供了丰富的信息。本研究的目的是将OU统计分析作为一种用于功能基因组学和注释的新型基于网络的工具提供。该工具也可作为可下载的软件包获得。
开发了SeqWord基因组浏览器(SWGB)以可视化DNA序列的自然组成变异。该小程序还用于通过比较局部和全局OU模式,在细菌染色体、质粒、噬菌体和病毒的注释序列以及注释前的原始DNA序列中识别不同的基因组区域。该小程序允许快速可靠地识别水平转移的基因组岛、大型多结构域基因和核糖体RNA基因的簇。在大多数基因组片段(也称为基因组核心序列)内,可以对比富含管家基因、核糖体蛋白的区域以及富含假基因或遗传遗迹的区域。
SWGB小程序展示了为一系列细菌物种、质粒和噬菌体计算的一系列全面的OU统计参数。可通过http://www.bi.up.ac.za/SeqWord/mhhapplet.php在互联网上获取。