Suppr超能文献

通过深度学习对外显子组测序数据进行拷贝数变异的精确分析。

Polishing copy number variant calls on exome sequencing data via deep learning.

机构信息

Department of Computer Engineering, Bilkent University, 06800 Ankara, Turkey.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.

出版信息

Genome Res. 2022 Jun;32(6):1170-1182. doi: 10.1101/gr.274845.120. Epub 2022 Jun 13.

Abstract

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, , which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.

摘要

准确高效地检测拷贝数变异(CNVs)至关重要,因为它们与复杂的遗传疾病有显著关联。尽管使用全基因组测序(WGS)数据的算法在大多数有效的统计假设下提供了稳定的结果,但全外显子组测序(WES)数据的拷贝数检测准确度相对较低。这很不幸,因为 WES 数据具有成本效益高、紧凑且相对普遍的特点。其瓶颈主要归因于靶向捕获的非连续性质:靶向基因组杂交、GC 含量、靶向探针以及测序过程中的样本分批的偏倚。在这里,我们提出了一种新颖的深度学习模型 ,它使用匹配的 WES 和 WGS 数据,并学习纠正任何现成的基于 WES 的种系 CNV 调用者报告的拷贝数变异。我们在 1000 基因组计划数据上训练 DECoNT,并展示了我们可以有效地将重复调用的精度提高三倍,并将删除调用的精度提高两倍,优于最先进的算法。我们还表明,我们的模型独立于(1)测序技术、(2)外显子组捕获试剂盒和(3)CNV 调用者,始终可以提高性能。使用 DECoNT 作为通用外显子 CNV 调用抛光器有可能提高 WES 数据集上种系 CNV 检测的可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/892a/9248885/76c65344be5c/1170f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验