• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CGALN:快速且节省空间的全基因组比对。

Cgaln: fast and space-efficient whole-genome alignment.

机构信息

Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto-shi, Kyoto 606-8501, Japan.

出版信息

BMC Bioinformatics. 2010 Apr 30;11:224. doi: 10.1186/1471-2105-11-224.

DOI:10.1186/1471-2105-11-224
PMID:20433723
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2873541/
Abstract

BACKGROUND

Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer.

RESULTS

We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory.

CONCLUSIONS

Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.

摘要

背景

全基因组序列比对是从研究中的基因组功能、进化和特征中提取有价值信息的必要过程。随着可用基因组序列数据的快速积累,人们对能够在实际的时间和空间内比较全基因组序列的工具产生了巨大的需求。然而,大多数现有的基因组比对工具一次只能处理几 Mb 长的序列,并且没有最先进的对齐程序可以直接在传统的独立计算机上对齐大型序列,如哺乳动物基因组。

结果

我们之前提出了 CGAT(粗粒度对齐)算法,该算法分两步执行对齐工作:首先在块级别,然后在核苷酸级别。前者是“粗粒度”对齐,可以探索基因组重排并减少下一步要分析的区域的大小。后者是在有限区域内的详细对齐。在本文中,我们介绍了该算法的更新和开源程序 Cgaln,该程序实现了该算法。我们将 Cgaln 的性能与其他程序在几种细菌的全基因组序列和一些哺乳动物染色体对上的性能进行了比较。结果表明,Cgaln 的速度比现有最好的程序快几倍,内存效率更高,而其敏感性和准确性与最好的程序相当。Cgaln 在单个 CPU 和 2GB 内存的传统台式计算机上单个运行时,完成人类和小鼠全基因组之间的对齐不到 13 小时。

结论

Cgaln 不仅速度快、内存效率高,而且能有效地处理基因组重排。我们的结果表明,Cgaln 非常有效地用于比较大型基因组,特别是完整的染色体序列。我们相信 Cgaln 为降低计算复杂性提供了新的视角,并将为基因组科学的各个领域做出贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/bcff48d77bf6/1471-2105-11-224-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/d98a8dc5b0c1/1471-2105-11-224-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/9e985dcf7ad2/1471-2105-11-224-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/8c0870389223/1471-2105-11-224-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/bf457ac769fd/1471-2105-11-224-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/bcff48d77bf6/1471-2105-11-224-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/d98a8dc5b0c1/1471-2105-11-224-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/9e985dcf7ad2/1471-2105-11-224-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/8c0870389223/1471-2105-11-224-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/bf457ac769fd/1471-2105-11-224-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2787/2873541/bcff48d77bf6/1471-2105-11-224-5.jpg

相似文献

1
Cgaln: fast and space-efficient whole-genome alignment.CGALN:快速且节省空间的全基因组比对。
BMC Bioinformatics. 2010 Apr 30;11:224. doi: 10.1186/1471-2105-11-224.
2
A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence.一种用于将cDNA序列定位和比对到基因组序列上的节省空间且准确的方法。
Nucleic Acids Res. 2008 May;36(8):2630-8. doi: 10.1093/nar/gkn105. Epub 2008 Mar 15.
3
G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.G-Anchor:一种利用进化保守 DNA 序列进行全基因组比较作图的新方法。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy017.
4
GSAlign: an efficient sequence alignment tool for intra-species genomes.GSAlign:一种用于种内基因组的高效序列比对工具。
BMC Genomics. 2020 Feb 24;21(1):182. doi: 10.1186/s12864-020-6569-1.
5
CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes.CGAT:一种用于在分析密切相关基因组之间复杂进化变化时可视化比对结果的比较基因组分析工具。
BMC Bioinformatics. 2006 Oct 24;7:472. doi: 10.1186/1471-2105-7-472.
6
Mugsy: fast multiple alignment of closely related whole genomes.Mugsy:快速比对密切相关的整个基因组。
Bioinformatics. 2011 Feb 1;27(3):334-42. doi: 10.1093/bioinformatics/btq665. Epub 2010 Dec 9.
7
Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes.Murasaki:一种快速、可并行化的算法,用于从多个基因组中寻找锚点。
PLoS One. 2010 Sep 24;5(9):e12651. doi: 10.1371/journal.pone.0012651.
8
A fast adaptive algorithm for computing whole-genome homology maps.一种用于计算全基因组同源图谱的快速自适应算法。
Bioinformatics. 2018 Sep 1;34(17):i748-i756. doi: 10.1093/bioinformatics/bty597.
9
Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.压缩泛基因组的分布式混合索引,实现可扩展和快速的序列比对。
PLoS One. 2021 Aug 3;16(8):e0255260. doi: 10.1371/journal.pone.0255260. eCollection 2021.
10
Fast algorithms for large-scale genome alignment and comparison.用于大规模基因组比对和比较的快速算法。
Nucleic Acids Res. 2002 Jun 1;30(11):2478-83. doi: 10.1093/nar/30.11.2478.

引用本文的文献

1
Unraveling Genome Evolution Throughout Visual Analysis: The XCout Portal.通过视觉分析揭示基因组进化:XCout门户。
Bioinform Biol Insights. 2021 Jun 8;15:11779322211021422. doi: 10.1177/11779322211021422. eCollection 2021.
2
GSAlign: an efficient sequence alignment tool for intra-species genomes.GSAlign:一种用于种内基因组的高效序列比对工具。
BMC Genomics. 2020 Feb 24;21(1):182. doi: 10.1186/s12864-020-6569-1.
3
Ultra-fast genome comparison for large-scale genomic experiments.用于大规模基因组实验的超快速基因组比较。

本文引用的文献

1
Global microsatellite content distinguishes humans, primates, animals, and plants.全球微卫星含量可区分人类、灵长类动物、动物和植物。
Mol Biol Evol. 2009 Dec;26(12):2809-19. doi: 10.1093/molbev/msp192. Epub 2009 Aug 28.
2
The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group.首个韩国人基因组序列及分析:针对一个社会族群的全基因组测序
Genome Res. 2009 Sep;19(9):1622-9. doi: 10.1101/gr.092197.109. Epub 2009 May 26.
3
Multiple whole-genome alignments without a reference organism.无参考生物体的多个全基因组比对
Sci Rep. 2019 Jul 16;9(1):10274. doi: 10.1038/s41598-019-46773-w.
4
seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.seq-seq-pan:在全基因组比对的基础上构建计算泛基因组数据结构。
BMC Genomics. 2018 Jan 15;19(1):47. doi: 10.1186/s12864-017-4401-3.
5
Towards the bridging of molecular genetics data across Xenopus species.迈向非洲爪蟾属物种间分子遗传学数据的衔接。
BMC Genomics. 2016 Mar 1;17:161. doi: 10.1186/s12864-016-2440-9.
6
Conserved PCR primer set designing for closely-related species to complete mitochondrial genome sequencing using a sliding window-based PSO algorithm.基于滑动窗口的粒子群算法设计用于密切相关物种的保守 PCR 引物,以完成线粒体基因组测序。
PLoS One. 2011 Mar 18;6(3):e17729. doi: 10.1371/journal.pone.0017729.
Genome Res. 2009 Apr;19(4):682-9. doi: 10.1101/gr.081778.108. Epub 2009 Jan 28.
4
All hits all the time: parameter-free calculation of spaced seed sensitivity.始终全面命中:间隔种子敏感性的无参数计算
Bioinformatics. 2009 Feb 1;25(3):302-8. doi: 10.1093/bioinformatics/btn643. Epub 2008 Dec 18.
5
The whole alignment and nothing but the alignment: the problem of spurious alignment flanks.完全对齐且只有对齐:虚假对齐侧翼的问题。
Nucleic Acids Res. 2008 Oct;36(18):5863-71. doi: 10.1093/nar/gkn579. Epub 2008 Sep 16.
6
Direct mapping and alignment of protein sequences onto genomic sequence.蛋白质序列到基因组序列的直接映射与比对。
Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.
7
Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes.比较序列分析方法:迈向脊椎动物基因组的功能视角
Nat Rev Genet. 2008 Apr;9(4):303-13. doi: 10.1038/nrg2185.
8
A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence.一种用于将cDNA序列定位和比对到基因组序列上的节省空间且准确的方法。
Nucleic Acids Res. 2008 May;36(8):2630-8. doi: 10.1093/nar/gkn105. Epub 2008 Mar 15.
9
28-way vertebrate alignment and conservation track in the UCSC Genome Browser.加州大学圣克鲁兹分校基因组浏览器中的28种脊椎动物序列比对与保守性追踪。
Genome Res. 2007 Dec;17(12):1797-808. doi: 10.1101/gr.6761107. Epub 2007 Nov 5.
10
MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups.MBGD:一个基于直系同源群自动构建的微生物比较基因组学平台。
Nucleic Acids Res. 2007 Jan;35(Database issue):D343-6. doi: 10.1093/nar/gkl978. Epub 2006 Nov 29.