Koren Sergey, Rhie Arang, Walenz Brian P, Dilthey Alexander T, Bickhart Derek M, Kingan Sarah B, Hiendleder Stefan, Williams John L, Smith Timothy P L, Phillippy Adam M
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA.
Institute of Medical Microbiology, Heinrich-Heine-University Düsseldorf, Düsseldorf, North Rhine-Westphalia, Germany.
Nat Biotechnol. 2018 Oct 22. doi: 10.1038/nbt.4277.
Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.
复杂的等位基因变异阻碍了从二倍体基因组中组装单倍型解析序列。我们开发了三重分箱法,这是一种通过在组装前解析等位基因变异来简化单倍型组装的方法。与先前的方法相比,我们方法的有效性随着杂合度的增加而提高。三重分箱法利用来自两个亲本基因组的短读长,首先将来自一个后代的长读长划分为特定单倍型的集合。然后分别组装每个单倍型,从而实现完整的二倍体重建。我们使用三重分箱法恢复了一个二倍体人类基因组的两个单倍型,并鉴定出其他方法遗漏的复杂结构变异。我们对牛的两个亚种——欧洲牛(Bos taurus taurus)和瘤牛(Bos taurus indicus)之间的F1杂交后代进行了测序,并以大于20 Mb的NG50重叠群大小和99.998%的准确率完全组装了两个亲本单倍型,超过了当前牛参考基因组的质量。我们认为三重分箱法改进了二倍体基因组组装,并将促进对单倍型变异和遗传的新研究。