Suppr超能文献

自适应种子驯服基因组序列比较。

Adaptive seeds tame genomic sequence comparison.

机构信息

Department of Computational Biology, Max Planck Institute for Molecular Genetics, Berlin D-14195, Germany.

出版信息

Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5.

Abstract

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

摘要

分析生物序列的主要方法是将它们相互比较和对齐。然而,要比较现代的数十亿碱基对 DNA 数据集仍然很困难。这种困难是由这些序列的非均匀(寡)核苷酸组成引起的,而不是它们的大小本身。为了解决这个问题,我们修改了标准的种子和扩展方法(例如 BLAST)来使用自适应种子。自适应种子是根据它们的稀有性而不是使用固定长度的匹配来选择的匹配。这种方法保证了匹配的数量,从而使运行时间随着序列长度的增加而线性增加,而不是二次增加。LAST,我们的自适应种子的开源实现,使具有任意非均匀组成的大序列的快速和敏感比较成为可能。

相似文献

1
Adaptive seeds tame genomic sequence comparison.
Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5.
2
Choosing the best heuristic for seeded alignment of DNA sequences.
BMC Bioinformatics. 2006 Mar 13;7:133. doi: 10.1186/1471-2105-7-133.
3
Designing multiple simultaneous seeds for DNA similarity search.
J Comput Biol. 2005 Jul-Aug;12(6):847-61. doi: 10.1089/cmb.2005.12.847.
4
Mastering seeds for genomic size nucleotide BLAST searches.
Nucleic Acids Res. 2003 Dec 1;31(23):6935-41. doi: 10.1093/nar/gkg886.
5
Multiple alignment of DNA sequences with MAFFT.
Methods Mol Biol. 2009;537:39-64. doi: 10.1007/978-1-59745-251-9_3.
6
Separating significant matches from spurious matches in DNA sequences.
J Comput Biol. 2012 Jan;19(1):1-12. doi: 10.1089/cmb.2011.0070. Epub 2011 Dec 9.
7
WindowMasker: window-based masker for sequenced genomes.
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.
8
pblat: a multithread blat algorithm speeding up aligning sequences to genomes.
BMC Bioinformatics. 2019 Jan 15;20(1):28. doi: 10.1186/s12859-019-2597-8.
9
CSA: an efficient algorithm to improve circular DNA multiple alignment.
BMC Bioinformatics. 2009 Jul 23;10:230. doi: 10.1186/1471-2105-10-230.
10
Indel seeds for homology search.
Bioinformatics. 2006 Jul 15;22(14):e341-9. doi: 10.1093/bioinformatics/btl263.

引用本文的文献

1
Novel genomics insights into the molecular evolution of long-distance migratory mammals.
BMC Genomics. 2025 Sep 2;26(1):795. doi: 10.1186/s12864-025-12022-w.
2
5
Convergent evolution through independent rearrangements in the primate amylase locus.
bioRxiv. 2025 Aug 15:2025.08.14.670395. doi: 10.1101/2025.08.14.670395.
6
Integrated metabolomic and transcriptomic analysis reveals the biosynthesis mechanism of dihydrochalcones in sweet tea ().
Front Plant Sci. 2025 Aug 4;16:1629266. doi: 10.3389/fpls.2025.1629266. eCollection 2025.
7
Accurate, Scalable Structural Variant Genotyping in Complex Genomes at Population Scales.
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf180.
9
CGC1, a new reference genome for .
Genome Res. 2025 Aug 1;35(8):1902-1918. doi: 10.1101/gr.280274.124.
10
NEAR: neural embeddings for amino acid relationships.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i449-i457. doi: 10.1093/bioinformatics/btaf198.

本文引用的文献

1
A survey of sequence alignment algorithms for next-generation sequencing.
Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11.
2
Parameters for accurate genome alignment.
BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80.
3
Incorporating sequence quality data into alignment improves DNA read mapping.
Nucleic Acids Res. 2010 Apr;38(7):e100. doi: 10.1093/nar/gkq010. Epub 2010 Jan 27.
4
Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content.
Nature. 2010 Jan 28;463(7280):536-9. doi: 10.1038/nature08700. Epub 2010 Jan 13.
5
Sequencing technologies - the next generation.
Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.
6
On subset seeds for protein alignment.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):483-94. doi: 10.1109/TCBB.2009.4.
7
How to map billions of short reads onto genomes.
Nat Biotechnol. 2009 May;27(5):455-7. doi: 10.1038/nbt0509-455.
8
The regulated retrotransposon transcriptome of mammalian cells.
Nat Genet. 2009 May;41(5):563-71. doi: 10.1038/ng.368. Epub 2009 Apr 19.
9
Database indexing for production MegaBLAST searches.
Bioinformatics. 2008 Aug 15;24(16):1757-64. doi: 10.1093/bioinformatics/btn322. Epub 2008 Jun 21.
10
Space efficient computation of rare maximal exact matches between multiple sequences.
J Comput Biol. 2008 May;15(4):357-77. doi: 10.1089/cmb.2007.0105.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验