线索：一种用于结构变异发现和基因分型的深度学习框架。

Cue: a deep-learning framework for structural variant discovery and genotyping.

机构信息

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

出版信息

Nat Methods. 2023 Apr;20(4):559-568. doi: 10.1038/s41592-023-01799-x. Epub 2023 Mar 23.

DOI:10.1038/s41592-023-01799-x

PMID:36959322

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10152467/

Abstract

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.

摘要

结构变异（SV）是人类基因组中遗传多样性和疾病的主要驱动因素，其发现对于精准医学的发展至关重要。现有的 SV 调用器依赖于手工设计的特征和启发式方法来对 SV 进行建模，这既不能扩展到 SV 的巨大多样性，也不能充分利用测序数据集中的可用信息。在这里，我们提出了一个可扩展的深度学习框架 Cue，用于调用和基因分型 SV，可以直接从数据中学习复杂的 SV 抽象概念。在较高的层次上，Cue 将比对转换为图像，这些图像编码了与 SV 相关的信号，并使用堆叠沙漏卷积神经网络来预测每个图像中捕获的 SV 的类型、基因型和基因组位置。我们表明，Cue 在合成和真实短读数据上检测几种类型的 SV 的性能优于现有技术，并且可以轻松扩展到其他测序平台，同时实现有竞争力的性能。

相似文献

Cue: a deep-learning framework for structural variant discovery and genotyping.线索：一种用于结构变异发现和基因分型的深度学习框架。

Nat Methods. 2023 Apr;20(4):559-568. doi: 10.1038/s41592-023-01799-x. Epub 2023 Mar 23.

NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data.NPSV-deep：一种用于在短读长基因组测序数据中进行基因分型结构变体的深度学习方法。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae129.

VISTA: an integrated framework for structural variant discovery.VISTA：一个用于结构变异发现的集成框架。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae462.

NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data.NPSV：一种基于模拟的全基因组测序数据分析中结构变异基因分型方法。

Gigascience. 2021 Jul 1;10(7). doi: 10.1093/gigascience/giab046.

MAMnet: detecting and genotyping deletions and insertions based on long reads and a deep learning approach.MAMnet：基于长读长和深度学习方法检测和基因分型缺失和插入。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac195.

SVJedi: genotyping structural variations with long reads.使用长读长进行基因分型结构变异。

Bioinformatics. 2020 Nov 1;36(17):4568-4575. doi: 10.1093/bioinformatics/btaa527.

GGTyper: genotyping complex structural variants using short-read sequencing data.GGTyper：使用短读测序数据进行基因分型复杂结构变异。

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii11-ii19. doi: 10.1093/bioinformatics/btae391.

Automated filtering of genome-wide large deletions through an ensemble deep learning framework.通过集成深度学习框架自动筛选全基因组大片段缺失。

Methods. 2022 Oct;206:77-86. doi: 10.1016/j.ymeth.2022.08.001. Epub 2022 Aug 28.

Enhancing breakpoint resolution with deep segmentation model: A general refinement method for read-depth based structural variant callers.利用深度分割模型提高断点分辨率：一种基于读取深度的结构变异调用器的通用细化方法。

PLoS Comput Biol. 2021 Oct 11;17(10):e1009186. doi: 10.1371/journal.pcbi.1009186. eCollection 2021 Oct.

StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants.StrVCTVRE：一种用于预测人类基因组结构变异致病性的监督学习方法。

Am J Hum Genet. 2022 Feb 3;109(2):195-209. doi: 10.1016/j.ajhg.2021.12.007. Epub 2022 Jan 14.

引用本文的文献

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.混合长度基因组测序（blend-seq）：将短读长与低覆盖度长读长相结合以最大化变异发现。

bioRxiv. 2025 Sep 4:2024.11.01.621515. doi: 10.1101/2024.11.01.621515.

Accurate, Scalable Structural Variant Genotyping in Complex Genomes at Population Scales.群体规模下复杂基因组中准确、可扩展的结构变异基因分型

Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf180.

Blackbird: structural variant detection using synthetic and low-coverage long-reads.黑鹂：利用合成和低覆盖度长读段进行结构变异检测

Bioinform Adv. 2025 Jul 4;5(1):vbaf151. doi: 10.1093/bioadv/vbaf151. eCollection 2025.

CYTO-SV-ML: A Machine Learning Tool for Cytogenetic Structural Variant Analysis in Somatic Cell Type Using Genome Sequences.CYTO-SV-ML：一种利用基因组序列对体细胞类型进行细胞遗传学结构变异分析的机器学习工具。

Life (Basel). 2025 Jun 9;15(6):929. doi: 10.3390/life15060929.

cuteFC: regenotyping structural variants through an accurate and efficient force-calling method.cuteFC：通过一种准确高效的强制调用方法对结构变异进行重新基因分型。

Genome Biol. 2025 Jun 13;26(1):166. doi: 10.1186/s13059-025-03642-2.

Overcoming limitations to customize DeepVariant for domesticated animals with TrioTrain.利用TrioTrain克服限制以定制适用于家养动物的DeepVariant。

Genome Res. 2025 Aug 1;35(8):1859-1874. doi: 10.1101/gr.279542.124.

Toward a Kinh Vietnamese Reference Genome: Constructing a De Novo Genome Assembly Using Long-Read Sequencing and Optical Mapping.迈向京族越南人参考基因组：利用长读长测序和光学图谱构建从头基因组组装

Genes (Basel). 2025 Apr 29;16(5):536. doi: 10.3390/genes16050536.

Benchmarking of germline copy number variant callers from whole genome sequencing data for clinical applications.用于临床应用的全基因组测序数据中种系拷贝数变异检测工具的基准测试

Bioinform Adv. 2025 Apr 10;5(1):vbaf071. doi: 10.1093/bioadv/vbaf071. eCollection 2025.

Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variants.比较群体泛基因组揭示了结构变异出人意料的复杂性和适应性效应。

bioRxiv. 2025 Feb 13:2025.02.11.637762. doi: 10.1101/2025.02.11.637762.

SVEA: an accurate model for structural variation detection using multi-channel image encoding and enhanced AlexNet architecture.SVEA：一种使用多通道图像编码和增强型AlexNet架构进行结构变异检测的精确模型。

J Transl Med. 2025 Feb 22;23(1):221. doi: 10.1186/s12967-025-06213-y.

本文引用的文献

Cnngeno: A high-precision deep learning based strategy for the calling of structural variation genotype.Cnngeno：一种基于深度学习的高精度结构变异基因型调用策略。

Comput Biol Chem. 2021 Oct;94:107417. doi: 10.1016/j.compbiolchem.2020.107417. Epub 2020 Nov 21.

A deep learning approach for filtering structural variants in short read sequencing data.深度学习方法在短读测序数据中过滤结构变异。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa370.

PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2：一种带有新型质量评分生成模型的长读测序模拟软件。

Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.

A robust benchmark for detection of germline large deletions and insertions.一种用于检测种系大片段缺失和插入的稳健基准

Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.

A structural variation reference for medical and population genetics.医学和人群遗传学的结构变异参考

Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.

Patterns of somatic structural variation in human cancer genomes.人类癌症基因组中体结构变异的模式。

Nature. 2020 Feb;578(7793):112-121. doi: 10.1038/s41586-019-1913-9. Epub 2020 Feb 5.

Long-Read Sequencing Emerging in Medical Genetics.长读长测序在医学遗传学中崭露头角。

Front Genet. 2019 May 7;10:426. doi: 10.3389/fgene.2019.00426. eCollection 2019.

Resolving the full spectrum of human genome variation using Linked-Reads.利用连接读取技术解析人类基因组变异的全貌。

Genome Res. 2019 Apr;29(4):635-645. doi: 10.1101/gr.234443.118. Epub 2019 Mar 20.

A universal SNP and small-indel variant caller using deep neural networks.使用深度神经网络的通用 SNP 和小插入缺失变体调用器。

Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.

A synthetic-diploid benchmark for accurate variant-calling evaluation.用于准确变异呼叫评估的合成二倍体基准。

Nat Methods. 2018 Aug;15(8):595-597. doi: 10.1038/s41592-018-0054-7. Epub 2018 Jul 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验