Dhillon Braham, Goodwin Stephen B
USDA-ARS, Crop Production and Pest Control Research Unit, Purdue University, West Lafayette, IN, USA.
Methods Mol Biol. 2011;722:33-50. doi: 10.1007/978-1-61779-040-9_3.
Advances in sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data into meaningful information. In addition to protein coding regions, an integral part of all the genomes studied so far has been the presence of repetitive sequences. Previously considered as "junk," numerous studies have implicated repetitive sequences in important biological and structural roles in the genome. Therefore, the identification and characterization of these repetitive sequences has become an indispensable part of genome sequencing projects. Numerous similarity-based and de novo methods have been developed to search for and annotate repeats in the genome, many of which have been discussed in this chapter.
测序技术的进步从根本上改变了基因组测序项目的速度,并促使基因组数据量不断增加。与此同时,处理原始序列数据并将其转化为有意义信息的计算能力和资源也在增加。除了蛋白质编码区域外,到目前为止,所有已研究基因组的一个不可或缺的部分是重复序列的存在。重复序列以前被认为是“垃圾”,但大量研究表明它们在基因组中具有重要的生物学和结构作用。因此,这些重复序列的识别和特征分析已成为基因组测序项目中不可或缺的一部分。人们已经开发了许多基于相似性和从头开始的方法来搜索和注释基因组中的重复序列,本章讨论了其中的许多方法。