Suppr超能文献

线索:一种用于结构变异发现和基因分型的深度学习框架。

Cue: a deep-learning framework for structural variant discovery and genotyping.

机构信息

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

出版信息

Nat Methods. 2023 Apr;20(4):559-568. doi: 10.1038/s41592-023-01799-x. Epub 2023 Mar 23.

Abstract

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.

摘要

结构变异(SV)是人类基因组中遗传多样性和疾病的主要驱动因素,其发现对于精准医学的发展至关重要。现有的 SV 调用器依赖于手工设计的特征和启发式方法来对 SV 进行建模,这既不能扩展到 SV 的巨大多样性,也不能充分利用测序数据集中的可用信息。在这里,我们提出了一个可扩展的深度学习框架 Cue,用于调用和基因分型 SV,可以直接从数据中学习复杂的 SV 抽象概念。在较高的层次上,Cue 将比对转换为图像,这些图像编码了与 SV 相关的信号,并使用堆叠沙漏卷积神经网络来预测每个图像中捕获的 SV 的类型、基因型和基因组位置。我们表明,Cue 在合成和真实短读数据上检测几种类型的 SV 的性能优于现有技术,并且可以轻松扩展到其他测序平台,同时实现有竞争力的性能。

相似文献

6
SVJedi: genotyping structural variations with long reads.使用长读长进行基因分型结构变异。
Bioinformatics. 2020 Nov 1;36(17):4568-4575. doi: 10.1093/bioinformatics/btaa527.

引用本文的文献

本文引用的文献

4
5
A structural variation reference for medical and population genetics.医学和人群遗传学的结构变异参考
Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.
6
Patterns of somatic structural variation in human cancer genomes.人类癌症基因组中体结构变异的模式。
Nature. 2020 Feb;578(7793):112-121. doi: 10.1038/s41586-019-1913-9. Epub 2020 Feb 5.
7
Long-Read Sequencing Emerging in Medical Genetics.长读长测序在医学遗传学中崭露头角。
Front Genet. 2019 May 7;10:426. doi: 10.3389/fgene.2019.00426. eCollection 2019.
10
A synthetic-diploid benchmark for accurate variant-calling evaluation.用于准确变异呼叫评估的合成二倍体基准。
Nat Methods. 2018 Aug;15(8):595-597. doi: 10.1038/s41592-018-0054-7. Epub 2018 Jul 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验