Bailey Timothy L
IMB/University of Queensland.
Methods Mol Biol. 2007;395:271-92. doi: 10.1007/978-1-59745-514-5_17.
Sequence motif discovery algorithms are an important part of the computational biologist's toolkit. The purpose of motif discovery is to discover patterns in biopolymer (nucleotide or protein) sequences to better understand the structure and function of the molecules the sequences represent. This chapter provides an overview of the use of sequence motif discovery in biology and a general guide to the use of motif discovery algorithms. This chapter examines the types of biological features that DNA and protein motifs can represent and their usefulness. This chapter also defines what sequence motifs are, how they are represented, and general techniques for discovering them. The primary focus of the chapter is on one aspect of motif discovery: discovering motifs in a set of unaligned DNA or protein sequences. This chapter also provides the steps useful for checking the biological validity and investigating the function of sequence motifs using methods such as motif scanning-searching for matches to motifs in a given sequence or a database of sequences. A discussion of some limitations of motif discovery concludes the chapter.
序列基序发现算法是计算生物学家工具包的重要组成部分。基序发现的目的是在生物聚合物(核苷酸或蛋白质)序列中发现模式,以便更好地理解这些序列所代表的分子的结构和功能。本章概述了序列基序发现在生物学中的应用,并提供了使用基序发现算法的一般指南。本章研究了DNA和蛋白质基序可以代表的生物特征类型及其用途。本章还定义了什么是序列基序、它们如何表示以及发现它们的一般技术。本章的主要重点是基序发现的一个方面:在一组未比对的DNA或蛋白质序列中发现基序。本章还提供了一些有用的步骤,用于检查生物有效性以及使用基序扫描等方法研究序列基序的功能,即在给定序列或序列数据库中搜索与基序的匹配项。本章最后讨论了基序发现的一些局限性。