通过深度学习对外显子组测序数据进行拷贝数变异的精确分析。

Polishing copy number variant calls on exome sequencing data via deep learning.

机构信息

Department of Computer Engineering, Bilkent University, 06800 Ankara, Turkey.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.

出版信息

Genome Res. 2022 Jun;32(6):1170-1182. doi: 10.1101/gr.274845.120. Epub 2022 Jun 13.

DOI:10.1101/gr.274845.120

PMID:35697522

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9248885/

Abstract

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, , which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.

摘要

准确高效地检测拷贝数变异（CNVs）至关重要，因为它们与复杂的遗传疾病有显著关联。尽管使用全基因组测序（WGS）数据的算法在大多数有效的统计假设下提供了稳定的结果，但全外显子组测序（WES）数据的拷贝数检测准确度相对较低。这很不幸，因为 WES 数据具有成本效益高、紧凑且相对普遍的特点。其瓶颈主要归因于靶向捕获的非连续性质：靶向基因组杂交、GC 含量、靶向探针以及测序过程中的样本分批的偏倚。在这里，我们提出了一种新颖的深度学习模型，它使用匹配的 WES 和 WGS 数据，并学习纠正任何现成的基于 WES 的种系 CNV 调用者报告的拷贝数变异。我们在 1000 基因组计划数据上训练 DECoNT，并展示了我们可以有效地将重复调用的精度提高三倍，并将删除调用的精度提高两倍，优于最先进的算法。我们还表明，我们的模型独立于（1）测序技术、（2）外显子组捕获试剂盒和（3）CNV 调用者，始终可以提高性能。使用 DECoNT 作为通用外显子 CNV 调用抛光器有可能提高 WES 数据集上种系 CNV 检测的可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/892a/9248885/76c65344be5c/1170f01.jpg

相似文献

Polishing copy number variant calls on exome sequencing data via deep learning.通过深度学习对外显子组测序数据进行拷贝数变异的精确分析。

Genome Res. 2022 Jun;32(6):1170-1182. doi: 10.1101/gr.274845.120. Epub 2022 Jun 13.

ECOLE: Learning to call copy number variants on whole exome sequencing data.ECOLE：学习在全外显子组测序数据上调用拷贝数变异。

Nat Commun. 2024 Jan 2;15(1):132. doi: 10.1038/s41467-023-44116-y.

Evaluation of somatic copy number estimation tools for whole-exome sequencing data.全外显子组测序数据的体细胞拷贝数估计工具评估

Brief Bioinform. 2016 Mar;17(2):185-92. doi: 10.1093/bib/bbv055. Epub 2015 Jul 25.

An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.使用全外显子组测序数据对癌症拷贝数变异检测工具的评估

BMC Bioinformatics. 2017 May 31;18(1):286. doi: 10.1186/s12859-017-1705-x.

Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning.利用迁移学习准确地从外显子组测序数据中确认罕见拷贝数变异的调用。

Nucleic Acids Res. 2022 Nov 28;50(21):e123. doi: 10.1093/nar/gkac788.

Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth.全外显子测序深度数据中拷贝数变异的发现和统计基因分型。

Am J Hum Genet. 2012 Oct 5;91(4):597-607. doi: 10.1016/j.ajhg.2012.08.005.

Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.从全基因组和全外显子组测序中发现等位基因特异性拷贝数

Nucleic Acids Res. 2015 Aug 18;43(14):e90. doi: 10.1093/nar/gkv319. Epub 2015 Apr 16.

A machine-learning approach for accurate detection of copy number variants from exome sequencing.一种基于机器学习的方法，用于从外显子测序中准确检测拷贝数变异。

Genome Res. 2019 Jul;29(7):1134-1143. doi: 10.1101/gr.245928.118. Epub 2019 Jun 6.

Assessing the reproducibility of exome copy number variations predictions.评估外显子拷贝数变异预测的可重复性。

Genome Med. 2016 Aug 8;8(1):82. doi: 10.1186/s13073-016-0336-6.

Detection of clinically relevant copy number variants with whole-exome sequencing.全外显子测序检测临床相关拷贝数变异。

Hum Mutat. 2013 Oct;34(10):1439-48. doi: 10.1002/humu.22387. Epub 2013 Aug 30.

引用本文的文献

Mutational Analysis of Early, Low-Grade Bowel Polyps Defines a Subgroup with Concurrent, High-Risk Oncogenic Drivers Independent of Polyp Size.早期低级别肠息肉的突变分析确定了一个具有同时存在的高风险致癌驱动因素且与息肉大小无关的亚组。

Cancer Res Commun. 2025 Aug 1;5(8):1372-1383. doi: 10.1158/2767-9764.CRC-25-0182.

LYCEUM: learning to call copy number variants on low-coverage ancient genomes.学园：学习在低覆盖度古代基因组上识别拷贝数变异

Bioinformatics. 2025 Jul 1;41(Supplement_1):i285-i293. doi: 10.1093/bioinformatics/btaf244.

Should Scotland provide genome-wide sequencing for the diagnosis of rare developmental disorders? A cost-effectiveness analysis.苏格兰是否应提供全基因组测序用于罕见发育障碍的诊断？一项成本效益分析。

Eur J Health Econ. 2025 Apr;26(3):503-512. doi: 10.1007/s10198-024-01717-8. Epub 2024 Sep 9.

CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics.CopyVAE：一种基于变分自动编码器的方法，用于使用单细胞转录组学推断拷贝数变异。

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae284.

ECOLE: Learning to call copy number variants on whole exome sequencing data.ECOLE：学习在全外显子组测序数据上调用拷贝数变异。

Nat Commun. 2024 Jan 2;15(1):132. doi: 10.1038/s41467-023-44116-y.

Identification of copy number variants contributing to hallux valgus.导致拇外翻的拷贝数变异的鉴定。

Front Genet. 2023 Mar 23;14:1116284. doi: 10.3389/fgene.2023.1116284. eCollection 2023.

本文引用的文献

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.对扩展的 1000 基因组项目队列进行高覆盖率全基因组测序，包括 602 个三核苷酸重复序列。

Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.

X-CNV: genome-wide prediction of the pathogenicity of copy number variations.X-CNV：全基因组预测拷贝数变异的致病性。

Genome Med. 2021 Aug 18;13(1):132. doi: 10.1186/s13073-021-00945-4.

The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。

Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.

Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism.大规模外显子组测序研究表明自闭症的神经生物学既有发育性变化也有功能性变化。

Cell. 2020 Feb 6;180(3):568-584.e23. doi: 10.1016/j.cell.2019.12.036. Epub 2020 Jan 23.

Structural variation in the sequencing era.测序时代的结构变异。

Nat Rev Genet. 2020 Mar;21(3):171-189. doi: 10.1038/s41576-019-0180-9. Epub 2019 Nov 15.

A large data resource of genomic copy number variation across neurodevelopmental disorders.一个涵盖多种神经发育障碍的基因组拷贝数变异的大型数据资源。

NPJ Genom Med. 2019 Oct 7;4:26. doi: 10.1038/s41525-019-0098-3. eCollection 2019.

A machine-learning approach for accurate detection of copy number variants from exome sequencing.一种基于机器学习的方法，用于从外显子测序中准确检测拷贝数变异。

Genome Res. 2019 Jul;29(7):1134-1143. doi: 10.1101/gr.245928.118. Epub 2019 Jun 6.

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.递归神经网络综述：长短期记忆细胞和网络架构。

Neural Comput. 2019 Jul;31(7):1235-1270. doi: 10.1162/neco_a_01199. Epub 2019 May 21.

Multi-platform discovery of haplotype-resolved structural variation in human genomes.多平台发现人类基因组中单体型分辨率结构变异。

Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z.

Genome-wide Analysis of Common Copy Number Variation and Epithelial Ovarian Cancer Risk.全基因组常见拷贝数变异与上皮性卵巢癌风险的分析。

Cancer Epidemiol Biomarkers Prev. 2019 Jul;28(7):1117-1126. doi: 10.1158/1055-9965.EPI-18-0833. Epub 2019 Apr 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过深度学习对外显子组测序数据进行拷贝数变异的精确分析。

Polishing copy number variant calls on exome sequencing data via deep learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献