Suppr超能文献

无需比对的序列比较:SpaM方法

Sequence Comparison Without Alignment: The SpaM Approaches.

作者信息

Morgenstern Burkhard

机构信息

University of Göttingen, Department of Bioinformatics (IMG), Göttingen, Germany.

出版信息

Methods Mol Biol. 2021;2231:121-134. doi: 10.1007/978-1-0716-1036-7_8.

Abstract

Sequence alignment is at the heart of DNA and protein sequence analysis. For the data volumes that are nowadays produced by massively parallel sequencing technologies, however, pairwise and multiple alignment methods are often too slow. Therefore, fast alignment-free approaches to sequence comparison have become popular in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches are using the length of maximal word matches. While these methods are very fast, most of them rely on ad hoc measures of sequences similarity or dissimilarity that are hard to interpret. In this chapter, I describe a number of alignment-free methods that we developed in recent years. Our approaches are based on spaced-word matches ("SpaM"), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions. Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences using a stochastic model of molecular evolution.

摘要

序列比对是DNA和蛋白质序列分析的核心。然而,对于如今由大规模平行测序技术产生的数据量而言,两两比对和多重比对方法往往过于缓慢。因此,近年来快速的无比对序列比较方法变得流行起来。这些方法大多基于固定长度单词的词频,或基于词匹配统计。其他方法则使用最大词匹配的长度。虽然这些方法非常快速,但它们大多依赖于难以解释的序列相似性或不相似性的特设度量。在本章中,我将描述一些我们近年来开发的无比对方法。我们的方法基于间隔词匹配(“SpaM”),即基于不精确的词匹配,允许在某些预定义位置包含错配。与大多数以前的无比对方法不同,我们的方法能够使用分子进化的随机模型准确估计DNA或蛋白质序列之间的系统发育距离。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验