IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1277-1284. doi: 10.1109/TCBB.2020.3005673. Epub 2022 Jun 3.
The development of DNA sequencing technologies makes it possible to obtain reads originated from both copies of a chromosome (two parental chromosomes, or haplotypes) of a single individual. Reconstruction of both haplotypes (i.e., haplotype phasing) plays a crucial role in genetic analysis and provides relationship information between genetic variation and disease susceptibility. With the emerging third-generation sequencing technologies, most existing approaches for haplotype phasing suffer from performance issues to handle long and error-prone reads. We develop a divide-and-conquer algorithm, DCHap, to phase haplotypes using third-generation reads. We benchmark DCHap against three state-of-the-art phasing tools on both PacBio SMRT data and ONT Nanopore data. The experimental results show that DCHap generates more accurate or comparable results (measured by the switch errors) while being scalable for higher coverage and longer reads. DCHap is a fast and accurate algorithm for haplotype phasing using third-generation sequencing data. As the third-generation sequencing platforms continue improving on their throughput and read lengths, accurate and scalable tools like DCHap are important to improve haplotype phasing from the advances of sequencing technologies. The source code is freely available at https://github.com/yanboANU/Haplotype-phasing.
DNA 测序技术的发展使得获取单个个体两条染色体(双亲染色体或单倍型)的序列成为可能。两条单倍型的重建(即单倍型定相)在遗传分析中起着至关重要的作用,并提供了遗传变异与疾病易感性之间的关系信息。随着第三代测序技术的出现,大多数现有的单倍型定相方法在处理长且易错的读取时都存在性能问题。我们开发了一种分而治之的算法 DCHap,用于使用第三代读取进行单倍型定相。我们在 PacBio SMRT 数据和 ONT Nanopore 数据上,将 DCHap 与三种最先进的定相工具进行了基准测试。实验结果表明,DCHap 生成的结果更准确或更具可比性(通过转换错误来衡量),同时也可以扩展到更高的覆盖度和更长的读取。DCHap 是一种使用第三代测序数据进行单倍型定相的快速准确算法。随着第三代测序平台在通量和读取长度方面的不断改进,像 DCHap 这样准确且可扩展的工具对于利用测序技术的进步来提高单倍型定相的准确性非常重要。源代码可在 https://github.com/yanboANU/Haplotype-phasing 上免费获取。