Hall Ira M, Quinlan Aaron R
Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA.
Methods Mol Biol. 2012;838:225-48. doi: 10.1007/978-1-61779-507-7_11.
Structural variation (SV) encompasses diverse types of genomic variants including deletions, duplications, inversions, transpositions, translocations, and complex rearrangements, and is now recognized to be an abundant class of genetic variation in mammals. Different individuals, or strains, of a given species can differ by thousands of variants. However, despite a large number of studies over the past decade and impressive progress on many fronts, there remain significant gaps in our knowledge, particularly in species other than human. Arguably the most relevant among these are genetically tractable models such as mouse, rat, and dog. The emergence of efficient and affordable DNA sequencing technologies presents an opportunity to make rapid progress toward understanding the nature, origin, and function of SV in these, and other, domesticated species. Here, we summarize the current state of knowledge of SV in mammals, with a focus on the similarities and differences between domesticated species and human. We then present methods to identify SV breakpoints from next-generation sequence (NGS) data by paired-end mapping, split-read mapping, and local assembly, and discuss challenges that arise when interpreting these data in lineages with complex breeding histories and incomplete reference genomes. We further describe technical modifications that allow for identification of variants involving repetitive DNA elements such as transposons and segmental duplications. Finally, we explore a few of the key biological insights that can be gained by applying NGS methods to model organisms.
结构变异(SV)涵盖多种类型的基因组变异,包括缺失、重复、倒位、转座、易位和复杂重排,目前被认为是哺乳动物中丰富的一类遗传变异。给定物种的不同个体或品系可能存在数千种变异。然而,尽管在过去十年中进行了大量研究并在许多方面取得了令人瞩目的进展,但我们的知识仍存在重大空白,尤其是在人类以外的物种中。其中最相关的可能是小鼠、大鼠和狗等遗传上易于处理的模型。高效且经济实惠的DNA测序技术的出现为在理解这些及其他驯化物种中SV的性质、起源和功能方面取得快速进展提供了机会。在这里,我们总结了哺乳动物中SV的当前知识状态,重点关注驯化物种与人类之间的异同。然后,我们介绍了通过双末端映射、分裂读映射和局部组装从下一代序列(NGS)数据中识别SV断点的方法,并讨论了在具有复杂育种历史和不完整参考基因组的谱系中解释这些数据时出现的挑战。我们进一步描述了允许识别涉及转座子和片段重复等重复DNA元件的变异的技术改进。最后,我们探讨了通过将NGS方法应用于模式生物可以获得的一些关键生物学见解。