Lo Christine, Liu Rui, Lee Jehyuk, Robasky Kimberly, Byrne Susan, Lucchesi Carolina, Aach John, Church George, Bafna Vineet, Zhang Kun
Genome Biol. 2013;14(9):R100. doi: 10.1186/gb-2013-14-9-r100.
Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes,but are difficult to obtain with routine sequencing approaches. Experimental haplotype reconstruction based on assembling fragments of individual chromosomes is promising, but with variable yields due to incompletely understood parameter choices.
We parameterize the clone-based haplotyping problem in order to provide theoretical and empirical assessments of the impact of different parameters on haplotype assembly. We confirm the intuition that long clones help link together heterozygous variants and thus improve haplotype length. Furthermore, given the length of the clones, we address how to choose the other parameters, including number of pools, clone coverage and sequencing coverage, so as to maximize haplotype length. We model the problem theoretically and show empirically the benefits of using larger clones with moderate number of pools and sequencing coverage. In particular, using 140 kb BAC clones, we construct haplotypes for a personal genome and assemble haplotypes with N50 values greater than 2.6 Mb. These assembled haplotypes are longer and at least as accurate as haplotypes of existing clone-based strategies, whether in vivo or in vitro.
Our results provide practical guidelines for the development and design of clone-based methods to achieve long range, high-resolution and accurate haplotypes.
单倍型对于评估个体基因组的谱系和疾病易感性很重要,但用常规测序方法难以获得。基于组装单个染色体片段的实验性单倍型重建很有前景,但由于参数选择尚未完全理解,其产量存在差异。
我们对基于克隆的单倍型分型问题进行参数化,以便对不同参数对单倍型组装的影响进行理论和实证评估。我们证实了这样的直觉,即长克隆有助于将杂合变异连接在一起,从而提高单倍型长度。此外,给定克隆的长度,我们探讨如何选择其他参数,包括池的数量、克隆覆盖率和测序覆盖率,以最大化单倍型长度。我们从理论上对该问题进行建模,并通过实证表明使用具有适度池数量和测序覆盖率的较大克隆的好处。特别是,使用140 kb的BAC克隆,我们为一个个人基因组构建单倍型,并组装出N50值大于2.6 Mb的单倍型。这些组装的单倍型更长,并且至少与现有基于克隆的策略(无论是体内还是体外)的单倍型一样准确。
我们的结果为基于克隆的方法的开发和设计提供了实用指南,以实现长距离、高分辨率和准确的单倍型。