School of Life Science and Technology, Tokyo Institute of Technology, Meguro-ku, Tokyo, 152-8550, Japan.
Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan.
Nat Commun. 2019 Apr 12;10(1):1702. doi: 10.1038/s41467-019-09575-2.
The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. These methods work well for lowly heterozygous genomes, but the manifold species have high heterozygosity. Additionally, there are highly divergent regions (HDRs), where the haplotype sequences differ considerably. Because HDRs are likely to direct various interesting biological phenomena, many genomic analysis targets fall within these regions. However, they cannot be accessed by existing phasing methods, and we have to adopt costly traditional methods. Here, we develop a de novo haplotype assembler, Platanus-allee ( http://platanus.bio.titech.ac.jp/platanus2 ), which initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. A comprehensive benchmark analysis reveals that Platanus-allee exhibits high recall and precision, particularly for HDRs. Using this approach, previously unknown HDRs are detected in the human genome, which may uncover novel aspects of genome variability.
二倍体基因组确定的最终目标是独立地完全解码同源染色体,已经开发出了几个来自共识序列的相位程序。这些方法在低度杂合基因组中效果很好,但许多物种的杂合度很高。此外,还有高度分化的区域(HDR),其中单倍型序列有很大的差异。由于 HDR 可能指导着各种有趣的生物学现象,许多基因组分析的目标都落在这些区域内。然而,现有的相位方法无法访问它们,我们不得不采用昂贵的传统方法。在这里,我们开发了一种从头开始的单倍型组装器,Platanus-allee(http://platanus.bio.titech.ac.jp/platanus2),它首先构建每个单倍型序列,然后利用序列链接和同线性信息来解开组装图。全面的基准分析表明,Platanus-allee 表现出高召回率和精度,特别是在 HDR 方面。使用这种方法,在人类基因组中检测到以前未知的 HDR,这可能揭示基因组变异性的新方面。