Suppr超能文献

自适应种子驯服基因组序列比较。

Adaptive seeds tame genomic sequence comparison.

机构信息

Department of Computational Biology, Max Planck Institute for Molecular Genetics, Berlin D-14195, Germany.

出版信息

Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5.

Abstract

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

摘要

分析生物序列的主要方法是将它们相互比较和对齐。然而,要比较现代的数十亿碱基对 DNA 数据集仍然很困难。这种困难是由这些序列的非均匀(寡)核苷酸组成引起的,而不是它们的大小本身。为了解决这个问题,我们修改了标准的种子和扩展方法(例如 BLAST)来使用自适应种子。自适应种子是根据它们的稀有性而不是使用固定长度的匹配来选择的匹配。这种方法保证了匹配的数量,从而使运行时间随着序列长度的增加而线性增加,而不是二次增加。LAST,我们的自适应种子的开源实现,使具有任意非均匀组成的大序列的快速和敏感比较成为可能。

相似文献

1
Adaptive seeds tame genomic sequence comparison.自适应种子驯服基因组序列比较。
Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5.
5
Multiple alignment of DNA sequences with MAFFT.使用MAFFT对DNA序列进行多重比对。
Methods Mol Biol. 2009;537:39-64. doi: 10.1007/978-1-59745-251-9_3.
6
Separating significant matches from spurious matches in DNA sequences.区分DNA序列中真实匹配与虚假匹配。
J Comput Biol. 2012 Jan;19(1):1-12. doi: 10.1089/cmb.2011.0070. Epub 2011 Dec 9.
7
WindowMasker: window-based masker for sequenced genomes.窗口掩码器:用于测序基因组的基于窗口的掩码器。
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.
10
Indel seeds for homology search.用于同源性搜索的插入缺失种子。
Bioinformatics. 2006 Jul 15;22(14):e341-9. doi: 10.1093/bioinformatics/btl263.

引用本文的文献

10
NEAR: neural embeddings for amino acid relationships.NEAR:用于氨基酸关系的神经嵌入
Bioinformatics. 2025 Jul 1;41(Supplement_1):i449-i457. doi: 10.1093/bioinformatics/btaf198.

本文引用的文献

1
A survey of sequence alignment algorithms for next-generation sequencing.下一代测序序列比对算法综述。
Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11.
2
Parameters for accurate genome alignment.基因组精确比对的参数。
BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80.
5
Sequencing technologies - the next generation.测序技术——下一代。
Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.
6
On subset seeds for protein alignment.关于蛋白质比对的子集种子
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):483-94. doi: 10.1109/TCBB.2009.4.
9
Database indexing for production MegaBLAST searches.用于生产性MegaBLAST搜索的数据库索引编制。
Bioinformatics. 2008 Aug 15;24(16):1757-64. doi: 10.1093/bioinformatics/btn322. Epub 2008 Jun 21.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验