Weerakoon Minindu, Lee Sangjin, Mitchell Emily, Heaton Haynes
Auburn University, Auburn, AL, 36849, USA.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
BMC Bioinformatics. 2025 Jan 16;26(1):17. doi: 10.1186/s12859-024-06020-0.
Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads.
To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data.
TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.
太平洋生物科学公司(PacBio)的环形一致序列测序(CCS),也称为高保真(HiFi)技术,通过生成长(10 + kb)且高度准确的 reads,彻底改变了现代基因组学。这是通过对环形化的 DNA 分子进行多次测序并将它们组合成一个一致序列来实现的。目前,HiFi 技术提供的准确性和质量值估计对于基因组组装和种系变异检测等应用来说已经绰绰有余。然而,在对单条 reads 进行体细胞变异检测时,估计质量得分的准确性存在局限性。
为了解决体细胞变异检测中质量得分不准确的挑战,我们引入了 TopoQual,这是一种旨在提高碱基质量预测准确性的新型工具。TopoQual 利用包括偏序比对(POA)、拓扑平行碱基和深度学习算法等技术来优化一致序列。我们的结果表明,TopoQual 纠正了 PacBio 一致序列中约 31.9%的错误。此外,它能验证高达 q59 的碱基质量,这相当于每 90 万个碱基中有一个错误。这些改进将显著提高使用 HiFi 数据进行体细胞变异检测的可靠性。
TopoQual 通过提高 PacBio HiFi 测序数据的碱基质量预测准确性,代表了基因组学领域的一项重大进展。通过纠正相当一部分错误并实现高碱基质量验证,TopoQual 能够进行可靠且准确的体细胞变异检测。该工具不仅解决了当前 HiFi 技术的一个关键限制,还为各种研究和临床应用中的精确基因组分析开辟了新的可能性。