混合长度基因组测序（blend-seq）：将短读长与低覆盖度长读长相结合以最大化变异发现。

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.

作者信息

Magner Ricky, Cunial Fabio, Basu Sumit, Paulsen Ron, Saponas Scott, Shand Megan, Lennon Niall, Banks Eric

机构信息

Broad Institute of Harvard and MIT, Cambridge, MA, USA.

Microsoft Research, Redmond, WA, USA.

出版信息

bioRxiv. 2025 Sep 4:2024.11.01.621515. doi: 10.1101/2024.11.01.621515.

DOI:10.1101/2024.11.01.621515

PMID:40950019

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12424989/

Abstract

We introduce blend-seq, a workflow for combining data from traditional short-read sequencing pipelines with low-coverage long reads, to improve variant discovery for single samples without the full cost of high-coverage long reads. We demonstrate that with only 4x long-read coverage augmenting 30x short reads, we can improve SNP discovery across the genome, exceeding performance beyond even high-coverage short reads (60x). For genotype-agnostic discovery of structural variants, we see a threefold improvement in recall while maintaining precision by using the low-coverage long reads on their own, and show how we can improve genotyping accuracy by adding in the short-read data. In addition, we demonstrate how the long reads can better phase these variants, incorporating long-context information in the genome to substantially outperform phasing with short reads alone. Our experiments highlight the complementary nature of short- and long-read technologies: the former contributing higher depth for genotyping and the latter better resolution of larger events or those in difficult regions.

摘要

我们介绍了blend-seq，这是一种将传统短读长测序流程的数据与低覆盖度长读长相结合的工作流程，旨在提高单样本变异发现能力，同时无需承担高覆盖度长读长的全部成本。我们证明，仅用4倍长读长覆盖度增强30倍短读长，就能提高全基因组的单核苷酸多态性（SNP）发现能力，甚至超过高覆盖度短读长（60倍）的性能。对于结构变异的基因型无关发现，仅使用低覆盖度长读长就能在保持精度的同时将召回率提高三倍，并展示了如何通过加入短读长数据来提高基因分型准确性。此外，我们证明长读长能够更好地对这些变异进行定相，整合基因组中的长上下文信息，从而显著优于仅用短读长进行的定相。我们的实验突出了短读长和长读长技术的互补性：前者为基因分型提供更高的深度，后者对更大事件或困难区域的事件具有更好的分辨率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2ee/12424989/cd07b3ec5f8c/nihpp-2024.11.01.621515v3-f0001.jpg

相似文献

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.

bioRxiv. 2025 Sep 4:2024.11.01.621515. doi: 10.1101/2024.11.01.621515.

A personalized multi-platform assessment of somatic mosaicism in the human frontal cortex.

bioRxiv. 2024 Dec 21:2024.12.18.629274. doi: 10.1101/2024.12.18.629274.

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.

J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.

Blackbird: structural variant detection using synthetic and low-coverage long-reads.

bioRxiv. 2024 Nov 18:2024.11.17.624011. doi: 10.1101/2024.11.17.624011.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

HiFi long-read genomes for difficult-to-detect, clinically relevant variants.

Am J Hum Genet. 2025 Feb 6;112(2):450-456. doi: 10.1016/j.ajhg.2024.12.013. Epub 2025 Jan 13.

Experimental and Computational Methods for Allelic Imbalance Analysis from Single-Nucleus RNA-seq Data.

bioRxiv. 2025 Jan 15:2024.08.13.607784. doi: 10.1101/2024.08.13.607784.

De novo Genome Assembly Using Long Reads and Chromosome Conformation Capture.

Methods Mol Biol. 2025;2935:1-27. doi: 10.1007/978-1-0716-4583-3_1.

Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.

PLoS Comput Biol. 2025 Apr 7;21(4):e1012885. doi: 10.1371/journal.pcbi.1012885. eCollection 2025 Apr.

Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.

Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.

本文引用的文献

HiFi long-read genomes for difficult-to-detect, clinically relevant variants.

Am J Hum Genet. 2025 Feb 6;112(2):450-456. doi: 10.1016/j.ajhg.2024.12.013. Epub 2025 Jan 13.

GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants.

J Comput Biol. 2024 Jul;31(7):616-637. doi: 10.1089/cmb.2024.0601. Epub 2024 Jul 11.

Genomic data in the All of Us Research Program.

Nature. 2024 Mar;627(8003):340-346. doi: 10.1038/s41586-023-06957-x. Epub 2024 Feb 19.

HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing.

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae042.

Year in review 2023.

Nat Methods. 2024 Jan;21(1):1-2. doi: 10.1038/s41592-023-02158-6.

Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall.

Genome Res. 2023 Dec 27;33(12):2029-2040. doi: 10.1101/gr.278070.123.

GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data.

Nat Genet. 2023 Sep;55(9):1589-1597. doi: 10.1038/s41588-023-01449-0. Epub 2023 Aug 21.

SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph.

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i270-i278. doi: 10.1093/bioinformatics/btad237.

A draft human pangenome reference.

Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

Cue: a deep-learning framework for structural variant discovery and genotyping.

Nat Methods. 2023 Apr;20(4):559-568. doi: 10.1038/s41592-023-01799-x. Epub 2023 Mar 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

混合长度基因组测序（blend-seq）：将短读长与低覆盖度长读长相结合以最大化变异发现。

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.

作者信息

Magner Ricky, Cunial Fabio, Basu Sumit, Paulsen Ron, Saponas Scott, Shand Megan, Lennon Niall, Banks Eric

机构信息

Broad Institute of Harvard and MIT, Cambridge, MA, USA.

Microsoft Research, Redmond, WA, USA.

出版信息

bioRxiv. 2025 Sep 4:2024.11.01.621515. doi: 10.1101/2024.11.01.621515.

DOI:10.1101/2024.11.01.621515

PMID:40950019

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12424989/

Abstract

摘要

混合长度基因组测序（blend-seq）：将短读长与低覆盖度长读长相结合以最大化变异发现。

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

混合长度基因组测序（blend-seq）：将短读长与低覆盖度长读长相结合以最大化变异发现。

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.

作者信息

机构信息

出版信息

相似文献

本文引用的文献