Asri Mobin, Chang Pi-Chuan, Mier Juan Carlos, Sirén Jouni, Eskandar Parsa, Kolesnikov Alexey, Cook Daniel E, Brambrink Lucas, Hickey Glenn, Novak Adam M, Dorfman Lizzie, Webster Dale R, Carroll Andrew, Paten Benedict, Shafin Kishwar
UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
Google Inc, Mountain View, CA, USA.
bioRxiv. 2025 Jun 6:2025.06.05.657102. doi: 10.1101/2025.06.05.657102.
Population-scale genomics information provides valuable prior knowledge for various genomic analyses, especially variant calling. A notable example of such application is the human pangenome reference released by the Human Pangenome Reference Consortium, which has been shown to improve read mapping and structural variant genotyping. In this work, we introduce pangenome-aware DeepVariant, a variant caller that uses a pangenome reference alongside sample-specific read alignments. It generates pileup images of both reads and pangenome haplotypes near potential variants and uses a Convolutional Neural Network to infer genotypes. This approach allows directly using a pangenome for distinguishing true variant signals from sequencing or alignment noise. We assessed its performance on various short-read sequencing platforms and read mappers. Across all settings, pangenome-aware DeepVariant outperformed the linear-reference-based DeepVariant, reducing errors by up to 25.5%. We also show that Element reads with pangenome-aware DeepVariant can achieve 23.6% more accurate variant calling performance compared to existing methods.
群体规模的基因组学信息为各种基因组分析提供了有价值的先验知识,尤其是变异检测。此类应用的一个显著例子是人类泛基因组参考联盟发布的人类泛基因组参考,它已被证明可改善读段比对和结构变异基因分型。在这项工作中,我们引入了泛基因组感知的DeepVariant,这是一种变异检测工具,它使用泛基因组参考以及样本特异性读段比对。它会生成潜在变异附近读段和泛基因组单倍型的堆积图像,并使用卷积神经网络来推断基因型。这种方法允许直接使用泛基因组来区分来自测序或比对噪声的真实变异信号。我们在各种短读长测序平台和读段比对工具上评估了它的性能。在所有设置下,泛基因组感知的DeepVariant均优于基于线性参考的DeepVariant,错误率降低了25.5%。我们还表明,与现有方法相比,使用泛基因组感知的DeepVariant进行Element读段分析时,变异检测性能的准确率可提高23.6%。