Suppr超能文献

GATA:一种用于比较序列分析的图形比对工具。

GATA: a graphic alignment tool for comparative sequence analysis.

作者信息

Nix David A, Eisen Michael B

机构信息

Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA.

出版信息

BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.

Abstract

BACKGROUND

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.

RESULTS

To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file.

CONCLUSIONS

GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0-200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.

摘要

背景

当前用于比对DNA序列以进行比较序列分析的方法存在若干问题。大多数动态规划算法假定保守序列元件是共线的。在比较直系同源蛋白质编码序列时,这一假定似乎是有效的。蛋白质上的功能限制对序列倒位产生强大的选择压力,并使序列重复和特征重排最小化。对于非编码序列,这种共线性假定通常是无效的。例如,增强子包含转录因子结合位点簇,这些位点在进化过程中数量、方向和间距会发生变化,但增强子仍保留其活性。点阵分析常用于估计非编码序列的相关性。然而,点阵实际上并不比对序列,因此不能很好地处理碱基插入或缺失。此外,它们缺乏用于比较序列相关性的适当统计框架,并且仅限于成对比较。最后,点阵和动态规划文本输出未能提供一种直观的方式来可视化DNA比对。

结果

为了解决其中一些问题,我们创建了一个用于比较序列分析的独立、平台无关的图形比对工具(GATA,http://gata.sourceforge.net/)。GATA使用NCBI-BLASTN程序和广泛的后处理来识别所有高于低截止分数的小子比对。这些比对被绘制成两个阴影框,每个序列一个,使用它们父序列的坐标系用一条线连接。阴影和颜色用于表示分数和方向。存在多种用于查询、修改和检索保守序列元件的选项。可以使用标准化的通用特征格式(GFF)文件将广泛的基因注释添加到两个序列中。

结论

GATA结合使用NCBI-BLASTN程序和后处理来详尽地比对两个DNA序列。它为研究人员提供了一个细粒度的比对和可视化工具,非常适合非编码、0至200 kb、成对的序列分析。它的功能独立于序列特征的排序或方向,并且能够轻松地可视化大小序列的倒位、重复和片段重排。由于比对是可视化的且不包含间隙,可以将基因注释添加到两个序列中,以创建一个非常适合比较序列分析的关于DNA保守性的全面描述图。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f338/546196/d709141d1be2/1471-2105-6-9-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验