对PacBio HiFi reads进行靶向和全基因组测序分析，以全面基因分型基因近端和表型相关的可变数目串联重复序列。

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.

作者信息

Javadzadeh Sara, Adamson Aaron, Park Jonghun, Jo Se-Young, Ding Yuan-Chun, Bakhtiari Mehrdad, Bansal Vikas, Neuhausen Susan L, Bafna Vineet

机构信息

Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America.

Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, United States of America.

出版信息

PLoS Comput Biol. 2025 Apr 7;21(4):e1012885. doi: 10.1371/journal.pcbi.1012885. eCollection 2025 Apr.

DOI:10.1371/journal.pcbi.1012885

PMID:40193344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11975116/

Abstract

Variable Number Tandem repeats (VNTRs) refer to repeating motifs of size greater than five bp. VNTRs are an important source of genetic variation, and have been associated with multiple Mendelian and complex phenotypes. However, the highly repetitive structures require reads to span the region for accurate genotyping. Pacific Biosciences HiFi sequencing spans large regions and is highly accurate but relatively expensive. Therefore, targeted sequencing approaches coupled with long-read sequencing have been proposed to improve efficiency and throughput. In this paper, we systematically explored the trade-off between targeted and whole genome HiFi sequencing for genotyping VNTRs. We curated a set of 10 , 787 gene-proximal (G-)VNTRs, and 48 phenotype-associated (P-)VNTRs of interest. Illumina reads only spanned 46% of the G-VNTRs and 71% of P-VNTRs, motivating the use of HiFi sequencing. We performed targeted sequencing with hybridization by designing custom probes for 9,999 VNTRs and sequenced 8 samples using HiFi and Illumina sequencing, followed by adVNTR genotyping. We compared these results against HiFi whole genome sequencing (WGS) data from 28 samples in the Human Pangenome Reference Consortium (HPRC). With the targeted approach only 4,091 (41%) G-VNTRs and only 4 (8%) of P-VNTRs were spanned with at least 15 reads. A smaller subset of 3,579 (36%) G-VNTRs had higher median coverage of at least 63 spanning reads. The spanning behavior was consistent across all 8 samples. Among 5,638 VNTRs with low-coverage ( < 15), 67% were located within GC-rich regions ( > 60%). In contrast, the 40X WGS HiFi dataset spanned 98% of all VNTRs and 49 (98%) of P-VNTRs with at least 15 spanning reads, albeit with lower coverage. Spanning reads were sufficient for accurate genotyping in both cases. Our findings demonstrate that targeted sequencing provides consistently high coverage for a small subset of low-GC VNTRs, but WGS is more effective for broad and sufficient sampling of a large number of VNTRs.

摘要

可变数目串联重复序列（VNTRs）是指长度大于5个碱基对的重复基序。VNTRs是遗传变异的重要来源，并且与多种孟德尔和复杂表型相关。然而，高度重复的结构需要测序读数跨越该区域才能进行准确的基因分型。太平洋生物科学公司的高保真测序能够跨越较大区域且准确性高，但相对昂贵。因此，有人提出将靶向测序方法与长读长测序相结合以提高效率和通量。在本文中，我们系统地探讨了靶向高保真测序和全基因组高保真测序在VNTRs基因分型方面的权衡。我们精心挑选了一组10787个基因近端（G-）VNTRs以及48个感兴趣的与表型相关（P-）的VNTRs。Illumina测序读数仅覆盖了46%的G-VNTRs和71%的P-VNTRs，这促使我们使用高保真测序。我们通过为9999个VNTRs设计定制探针进行杂交靶向测序，并使用高保真测序和Illumina测序对8个样本进行测序，随后进行adVNTR基因分型。我们将这些结果与人类泛基因组参考联盟（HPRC）中28个样本的高保真全基因组测序（WGS）数据进行了比较。采用靶向方法时，只有4091个（41%）G-VNTRs和仅4个（8%）P-VNTRs被至少15条读数覆盖。一个较小的子集3579个（36%）G-VNTRs具有至少63条跨越读数的更高中位数覆盖率。所有8个样本的覆盖行为都是一致的。在5638个低覆盖率（<15）的VNTRs中，67%位于富含GC的区域（>60%）。相比之下，40倍的WGS高保真数据集覆盖了所有VNTRs的98%以及49个（98%）P-VNTRs，且至少有15条跨越读数，尽管覆盖率较低。在这两种情况下，跨越读数都足以进行准确的基因分型。我们的研究结果表明，靶向测序为一小部分低GC含量的VNTRs提供了持续的高覆盖率，但全基因组测序对于大量VNTRs的广泛且充分的采样更为有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c3c/11975116/bb4ecf6d64d6/pcbi.1012885.g001.jpg

相似文献

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.

PLoS Comput Biol. 2025 Apr 7;21(4):e1012885. doi: 10.1371/journal.pcbi.1012885. eCollection 2025 Apr.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Sertindole for schizophrenia.

Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Pharmacotherapy for smoking cessation: effects by subgroup defined by genetically informed biomarkers.

Cochrane Database Syst Rev. 2017 Sep 8;9(9):CD011823. doi: 10.1002/14651858.CD011823.pub2.

引用本文的文献

Long Read Genome Sequencing Elucidates Diverse Functional Consequences of Structural and Repeat Variation in Autism.

medRxiv. 2025 Jul 23:2025.07.20.25331880. doi: 10.1101/2025.07.20.25331880.

本文引用的文献

High-fidelity, large-scale targeted profiling of microsatellites.

Genome Res. 2024 Aug 20;34(7):1008-1026. doi: 10.1101/gr.278785.123.

LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads.

Genome Biol. 2024 Jul 4;25(1):176. doi: 10.1186/s13059-024-03319-2.

Analysis and benchmarking of small and large genomic variants across tandem repeats.

Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26.

Characterization and visualization of tandem repeats at genome scale.

Nat Biotechnol. 2024 Oct;42(10):1606-1614. doi: 10.1038/s41587-023-02057-3. Epub 2024 Jan 2.

Polymorphic short tandem repeats make widespread contributions to blood and serum traits.

Cell Genom. 2023 Dec 13;3(12):100458. doi: 10.1016/j.xgen.2023.100458.

A genomic mutational constraint map using variation in 76,156 human genomes.

Nature. 2024 Jan;625(7993):92-100. doi: 10.1038/s41586-023-06045-0. Epub 2023 Dec 6.

Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer.

Cell. 2023 Aug 17;186(17):3659-3673.e23. doi: 10.1016/j.cell.2023.07.002. Epub 2023 Jul 31.

vamos: variable-number tandem repeats annotation using efficient motif sets.

Genome Biol. 2023 Jul 27;24(1):175. doi: 10.1186/s13059-023-03010-y.

Targeted adaptive long-read sequencing for discovery of complex phased variants in inherited retinal disease patients.

Sci Rep. 2023 May 26;13(1):8535. doi: 10.1038/s41598-023-35791-4.

TRviz: a Python library for decomposing and visualizing tandem repeat sequences.

Bioinform Adv. 2023 Apr 26;3(1):vbad058. doi: 10.1093/bioadv/vbad058. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对PacBio HiFi reads进行靶向和全基因组测序分析，以全面基因分型基因近端和表型相关的可变数目串联重复序列。

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献