Paten Benedict, Novak Adam M, Eizenga Jordan M, Garrison Erik
Genomics Institute, CBSE, 501C Engineering 2, University of California Santa Cruz, Santa Cruz, California 95064, USA.
Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom.
Genome Res. 2017 May;27(5):665-676. doi: 10.1101/gr.214155.116. Epub 2017 Mar 30.
The human reference genome is part of the foundation of modern human biology and a monumental scientific achievement. However, because it excludes a great deal of common human variation, it introduces a pervasive reference bias into the field of human genomics. To reduce this bias, it makes sense to draw on representative collections of human genomes, brought together into reference cohorts. There are a number of techniques to represent and organize data gleaned from these cohorts, many using ideas implicitly or explicitly borrowed from graph-based models. Here, we survey various projects underway to build and apply these graph-based structures-which we collectively refer to as genome graphs-and discuss the improvements in read mapping, variant calling, and haplotype determination that genome graphs are expected to produce.
人类参考基因组是现代人类生物学的基础组成部分,也是一项具有里程碑意义的科学成就。然而,由于它排除了大量常见的人类变异,因此在人类基因组学领域引入了一种普遍存在的参考偏差。为了减少这种偏差,利用汇集到参考队列中的具有代表性的人类基因组集合是合理的。有多种技术可用于表示和组织从这些队列中收集的数据,其中许多技术都隐含或明确地借鉴了基于图的模型的思想。在这里,我们调查了正在进行的各种构建和应用这些基于图的结构(我们统称为基因组图)的项目,并讨论了基因组图有望在读取映射、变异检测和单倍型确定方面带来的改进。