Giles Tom C, Emes Richard D
School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK.
Advanced Data Analysis Centre, University of Nottingham, Leicestershire, LE12 5RD, UK.
Methods Mol Biol. 2017;1526:23-40. doi: 10.1007/978-1-4939-6613-4_2.
Recent technological advances in sequencing and high-throughput DNA cloning have resulted in the generation of vast quantities of biological sequence data. Ideally the functions of individual genes and proteins predicted by these methods should be assessed experimentally within the context of a defined hypothesis. However, if no hypothesis is known a priori, or the number of sequences to be assessed is large, bioinformatics techniques may be useful in predicting function.This chapter proposes a pipeline of freely available Web-based tools to analyze protein-coding DNA and peptide sequences of unknown function. Accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function.The following methods are described in detail: 1. Annotation of gene function through Protein domain detection (SMART and Pfam). 2. Sequence similarity methods for homolog detection (BLAST and DELTA-BLAST). 3. Comparing sequences to whole genome data.
测序技术和高通量DNA克隆技术的最新进展已产生了大量的生物序列数据。理想情况下,这些方法预测的单个基因和蛋白质的功能应在明确的假设背景下进行实验评估。然而,如果事先不知道任何假设,或者要评估的序列数量很大,生物信息学技术可能有助于预测功能。本章提出了一个利用基于网络的免费工具来分析功能未知的蛋白质编码DNA和肽序列的流程。在该流程的每个步骤中获得的累积信息用于构建一个可检验的功能假设。以下方法将详细描述:1. 通过蛋白质结构域检测(SMART和Pfam)对基因功能进行注释。2. 用于同源物检测的序列相似性方法(BLAST和DELTA-BLAST)。3. 将序列与全基因组数据进行比较。