Suppr超能文献

利用迁移学习准确地从外显子组测序数据中确认罕见拷贝数变异的调用。

Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning.

机构信息

Department of Systems Biology, Columbia University, New York, NY 10032, USA.

Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.

出版信息

Nucleic Acids Res. 2022 Nov 28;50(21):e123. doi: 10.1093/nar/gkac788.

Abstract

Exome sequencing is widely used in genetic studies of human diseases and clinical genetic diagnosis. Accurate detection of copy number variants (CNVs) is important to fully utilize exome sequencing data. However, exome data are noisy. None of the existing methods alone can achieve both high precision and recall rate. A common practice is to perform heuristic filtration followed by manual inspection of read depth of putative CNVs. This approach does not scale in large studies. To address this issue, we developed a transfer learning method, CNV-espresso, for in silico confirming rare CNVs from exome sequencing data. CNV-espresso encodes candidate CNVs from exome data as images and uses pretrained convolutional neural network models to classify copy number states. We trained CNV-espresso using an offspring-parents trio exome sequencing dataset, with inherited CNVs as positives and CNVs with Mendelian errors as negatives. We evaluated the performance using additional samples that have both exome and whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS data as a proxy of ground truth, CNV-espresso significantly improves precision while keeping recall almost intact, especially for CNVs that span a small number of exons. CNV-espresso can effectively replace manual inspection of CNVs in large-scale exome sequencing studies.

摘要

外显子组测序广泛应用于人类疾病的遗传学研究和临床遗传诊断。准确检测拷贝数变异(CNVs)对于充分利用外显子组测序数据非常重要。然而,外显子组数据存在噪声。现有的方法都无法单独达到高精度和高召回率。一种常见的做法是进行启发式过滤,然后手动检查假定 CNVs 的读深度。这种方法在大型研究中无法扩展。为了解决这个问题,我们开发了一种迁移学习方法 CNV-espresso,用于从外显子组测序数据中推断罕见的 CNVs。CNV-espresso 将外显子数据中的候选 CNVs 编码为图像,并使用预先训练的卷积神经网络模型对拷贝数状态进行分类。我们使用具有遗传 CNVs 的亲子三体外显子组测序数据集来训练 CNV-espresso,将其作为阳性,而将具有 Mendelian 错误的 CNVs 作为阴性。我们使用具有外显子组和全基因组测序(WGS)数据的其他样本评估性能。假设从 WGS 数据中检测到的 CNVs 作为真实情况的代理,CNV-espresso 在保持召回率几乎不变的情况下,显著提高了精度,特别是对于跨越少数外显子的 CNVs。CNV-espresso 可以有效地替代在外显子组测序研究中对 CNVs 的手动检查。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7fe/9756945/d3153384725b/gkac788fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验