Suppr超能文献

对PacBio HiFi reads进行靶向和全基因组测序分析,以全面基因分型基因近端和表型相关的可变数目串联重复序列。

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.

作者信息

Javadzadeh Sara, Adamson Aaron, Park Jonghun, Jo Se-Young, Ding Yuan-Chun, Bakhtiari Mehrdad, Bansal Vikas, Neuhausen Susan L, Bafna Vineet

机构信息

Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America.

Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, United States of America.

出版信息

PLoS Comput Biol. 2025 Apr 7;21(4):e1012885. doi: 10.1371/journal.pcbi.1012885. eCollection 2025 Apr.

Abstract

Variable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.

摘要

可变数目串联重复序列(VNTRs)是指长度大于5个碱基对的重复基序。VNTRs是遗传变异的重要来源,并且与多种孟德尔和复杂表型相关。然而,高度重复的结构需要测序读数跨越该区域才能进行准确的基因分型。太平洋生物科学公司的高保真测序能够跨越较大区域且准确性高,但相对昂贵。因此,有人提出将靶向测序方法与长读长测序相结合以提高效率和通量。在本文中,我们系统地探讨了靶向高保真测序和全基因组高保真测序在VNTRs基因分型方面的权衡。我们精心挑选了一组10787个基因近端(G-)VNTRs以及48个感兴趣的与表型相关(P-)的VNTRs。Illumina测序读数仅覆盖了46%的G-VNTRs和71%的P-VNTRs,这促使我们使用高保真测序。我们通过为9999个VNTRs设计定制探针进行杂交靶向测序,并使用高保真测序和Illumina测序对8个样本进行测序,随后进行adVNTR基因分型。我们将这些结果与人类泛基因组参考联盟(HPRC)中28个样本的高保真全基因组测序(WGS)数据进行了比较。采用靶向方法时,只有4091个(41%)G-VNTRs和仅4个(8%)P-VNTRs被至少15条读数覆盖。一个较小的子集3579个(36%)G-VNTRs具有至少63条跨越读数的更高中位数覆盖率。所有8个样本的覆盖行为都是一致的。在5638个低覆盖率(<15)的VNTRs中,67%位于富含GC的区域(>60%)。相比之下,40倍的WGS高保真数据集覆盖了所有VNTRs的98%以及49个(98%)P-VNTRs,且至少有15条跨越读数,尽管覆盖率较低。在这两种情况下,跨越读数都足以进行准确的基因分型。我们的研究结果表明,靶向测序为一小部分低GC含量的VNTRs提供了持续的高覆盖率,但全基因组测序对于大量VNTRs的广泛且充分的采样更为有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c3c/11975116/bb4ecf6d64d6/pcbi.1012885.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验