Suppr超能文献

黑鹂:利用合成和低覆盖度长读段进行结构变异检测

Blackbird: structural variant detection using synthetic and low-coverage long-reads.

作者信息

Meleshko Dmitry, Yang Rui, Maharjan Salil, Danko David C, Korobeynikov Anton, Hajirasouliha Iman

机构信息

Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY 10021, United States.

Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, NY 10021, United States.

出版信息

Bioinform Adv. 2025 Jul 4;5(1):vbaf151. doi: 10.1093/bioadv/vbaf151. eCollection 2025.

Abstract

MOTIVATION

Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.

RESULTS

We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 PacBio Hi-Fi long-read coverage.

AVAILABILITY AND IMPLEMENTATION

Blackbird is available at https://github.com/1dayac/Blackbird.

摘要

动机

最近的基准测试表明,大多数结构变异,尤其是在50 - 10000 bp范围内的变异,无法通过短读长测序解析,但长读长结构变异检测工具在相同数据集上表现更好。然而,高覆盖率的长读长测序成本高昂,且需要大量的输入DNA。降低覆盖率虽能降低成本,但会显著影响现有结构变异(SV)检测工具的性能。合成长读长技术能以较低成本提供长距离信息,但将其用于50 kbp以下的SV检测仍具有挑战性。

结果

我们提出了一种新颖的基于混合比对和局部组装的算法Blackbird,它使用合成长读长和低覆盖率长读长来改进结构变异检测。Blackbird不依赖全基因组组装,而是采用滑动窗口方法和合成长读长条形码信息来组装局部片段,整合长读长以提高结构变异检测的准确性。我们在真实人类基因组数据集上对Blackbird进行了评估。在HG002基因组瓶中基因组(GIAB)基准测试中,混合模式下的Blackbird展示了与最先进的长读长工具相当的结果,同时使用的长读长覆盖率更低。Blackbird仅需5倍覆盖率就能实现与使用10倍PacBio Hi-Fi长读长覆盖率的PBSV和Sniffles2相似的F1分数(缺失和插入分别为0.835和0.808)。

可用性和实现方式

Blackbird可在https://github.com/1dayac/Blackbird获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437c/12237510/fe9eecc9c4dd/vbaf151f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验