Suppr超能文献

基因组多序列比对:使用遗传算法进行优化

Genomic multiple sequence alignments: refinement using a genetic algorithm.

作者信息

Wang Chunlin, Lefkowitz Elliot J

机构信息

Department of Microbiology, University of Alabama, Birmingham, Alabama 35294-2170, USA.

出版信息

BMC Bioinformatics. 2005 Aug 8;6:200. doi: 10.1186/1471-2105-6-200.

Abstract

BACKGROUND

Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics--the practice of comparing genomic sequences from different species--plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program.

RESULTS

We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased--through the removal of gaps--by approximately 200 gapped regions representing roughly 1,300 gaps.

CONCLUSION

We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time.

摘要

背景

基因组序列数据若孤立看待则无法得到充分理解。比较基因组学——比较不同物种基因组序列的做法——在理解导致表型差异的物种间基因型差异以及揭示进化关系模式方面发挥着越来越重要的作用。比较基因组学的主要挑战之一是在两条或多条相关基因组序列之间生成高质量比对。近年来,已开发出多种用于比对大型基因组序列的工具。大多数工具利用启发式策略来识别一系列强序列相似性,然后将这些相似性用作锚点来比对锚点之间的区域。得到的比对在全局上是正确的,但在许多情况下局部并非最优。我们描述了一个新程序GenAlignRefine,它通过使用遗传算法来改善比对的局部区域,从而提高全局多重比对的整体质量。识别出低质量区域,使用T - Coffee程序重新比对,然后使用遗传算法进行优化。由于更好的COFFEE(基于一致性的比对评估目标函数)分数通常反映更高的比对质量,该算法搜索能产生更好COFFEE分数的比对。为改善遗传算法固有的缓慢问题,GenAlignRefine被实现为一个基于集群的并行程序。

结果

我们通过在Linux集群上运行GenAlignRefine算法来测试它,以优化模拟序列,以及优化15条正痘病毒基因组序列(长度约为260,000个核苷酸)的多重比对,这些序列最初是由Multi - LAGAN比对的。对于一个40处理器的Linux集群来说,优化正痘病毒比对中约200个模糊(比对不佳)区域大约需要150分钟。整体序列同一性仅略有增加;但重要的是,与此同时整体比对长度通过去除间隙减少了——大约减少了200个有间隙区域,代表约1300个间隙。

结论

我们以并行模式实现了一种遗传算法,以优化最初由各种比对工具生成的多重基因组序列比对。基准实验表明,优化算法在合理时间内改善了基因组序列比对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/751b/1208854/446bf52e169c/1471-2105-6-200-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验