大规模序列比较

Large-Scale Sequence Comparison.

作者信息

Lal Devi, Verma Mansi

机构信息

Ramjas College, University of Delhi, New Delhi, 110 007, India.

Sri Venkateswara College, University of Delhi (South Campus), Benito Juarez Road, Dhaula Kuan, New Delhi, 110 021, India.

出版信息

Methods Mol Biol. 2017;1525:191-224. doi: 10.1007/978-1-4939-6622-6_9.

DOI:10.1007/978-1-4939-6622-6_9

PMID:27896723

Abstract

There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.

摘要

基因组数据库中存有 millions of sequences，根据其结构和功能作用对它们进行分类是一项重要任务。序列比较是对DNA和蛋白质序列进行正确分类的先决条件，有助于为给定序列赋予推定或假设的结构和功能。有多种方法可用于比较序列，对于碱基对数量较少的序列以及大规模基因组比较而言，比对是首要方法。有各种工具可用于进行成对的大序列比较。最知名的工具要么执行全局比对，要么在两个序列之间生成局部比对。在本章中，我们首先提供有关序列比较的基本信息。接下来将描述构成序列比较基础的PAM和BLOSUM矩阵。我们还对当前可用的方法（如BLAST和FASTA）进行实际概述，随后描述和概述可用于基因组比较的工具，包括LAGAN、MumMER、BLASTZ和AVID。