Nguyen Ngan, Hickey Glenn, Zerbino Daniel R, Raney Brian, Earl Dent, Armstrong Joel, Kent W James, Haussler David, Paten Benedict
1 Center for Biomolecular Science and Engineering, University of California , Santa Cruz, California.
J Comput Biol. 2015 May;22(5):387-401. doi: 10.1089/cmb.2014.0146. Epub 2015 Jan 7.
A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalize the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes' ordering and orientation, creating a pan-genome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pan-genome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test. In addition, we test the use of a pan-genome for describing variations and as a reference for read mapping.
参考基因组是一个高质量的个体基因组,用作群体基因组或密切相关亚种基因组的坐标系统。给定一组通过同源性划分为比对块的基因组,我们将比对块排序和定向的问题形式化,使得得到的排序最大程度地与基础基因组的排序和定向一致,从而创建一个泛基因组参考排序。我们证明这个问题是NP难问题,但也通过实验和模拟展示了基于仙人掌图分解的启发式算法寻找局部最大解的性能。我们描述了Cactus软件的扩展,用于为全基因组比对创建泛基因组参考,并展示了如何使用人类变异数据作为测试来创建新颖的基因组浏览器可视化。此外,我们测试了使用泛基因组来描述变异以及作为读取映射的参考。