Magner Ricky, Cunial Fabio, Basu Sumit, Paulsen Ron, Saponas Scott, Shand Megan, Lennon Niall, Banks Eric
Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Microsoft Research, Redmond, WA, USA.
bioRxiv. 2025 Sep 4:2024.11.01.621515. doi: 10.1101/2024.11.01.621515.
We introduce blend-seq, a workflow for combining data from traditional short-read sequencing pipelines with low-coverage long reads, to improve variant discovery for single samples without the full cost of high-coverage long reads. We demonstrate that with only 4x long-read coverage augmenting 30x short reads, we can improve SNP discovery across the genome, exceeding performance beyond even high-coverage short reads (60x). For genotype-agnostic discovery of structural variants, we see a threefold improvement in recall while maintaining precision by using the low-coverage long reads on their own, and show how we can improve genotyping accuracy by adding in the short-read data. In addition, we demonstrate how the long reads can better phase these variants, incorporating long-context information in the genome to substantially outperform phasing with short reads alone. Our experiments highlight the complementary nature of short- and long-read technologies: the former contributing higher depth for genotyping and the latter better resolution of larger events or those in difficult regions.
我们介绍了blend-seq,这是一种将传统短读长测序流程的数据与低覆盖度长读长相结合的工作流程,旨在提高单样本变异发现能力,同时无需承担高覆盖度长读长的全部成本。我们证明,仅用4倍长读长覆盖度增强30倍短读长,就能提高全基因组的单核苷酸多态性(SNP)发现能力,甚至超过高覆盖度短读长(60倍)的性能。对于结构变异的基因型无关发现,仅使用低覆盖度长读长就能在保持精度的同时将召回率提高三倍,并展示了如何通过加入短读长数据来提高基因分型准确性。此外,我们证明长读长能够更好地对这些变异进行定相,整合基因组中的长上下文信息,从而显著优于仅用短读长进行的定相。我们的实验突出了短读长和长读长技术的互补性:前者为基因分型提供更高的深度,后者对更大事件或困难区域的事件具有更好的分辨率。