使用深度神经网络的通用 SNP 和小插入缺失变体调用器。

A universal SNP and small-indel variant caller using deep neural networks.

机构信息

Verily Life Sciences, Mountain View, California, USA.

Google Inc., Mountain View, California, USA.

出版信息

Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.

DOI:10.1038/nbt.4235

PMID:30247488

Abstract

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.

摘要

尽管测序技术发展迅速，但要从数十亿个短序列、易错的序列读取中准确地识别个体基因组中的遗传变异仍然具有挑战性。在这里，我们展示了一种深度卷积神经网络可以通过学习读取堆积图像与真实基因型调用之间的统计关系，从对齐的下一代测序读取数据中调用遗传变异。该方法称为 DeepVariant，其性能优于现有的最先进的工具。所学习的模型可以跨基因组构建和哺乳动物物种进行概括，从而允许非人类测序项目从丰富的人类真实数据中受益。我们进一步表明，DeepVariant 可以学习在各种测序技术和实验设计中调用变体，包括 10X Genomics 的深度全基因组和 Ion Ampliseq 外显子组，这突显了使用更自动化和更具通用性的技术进行变体调用的优势。

相似文献

A universal SNP and small-indel variant caller using deep neural networks.使用深度神经网络的通用 SNP 和小插入缺失变体调用器。

Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.

dv-trio: a family-based variant calling pipeline using DeepVariant.dv-trio：一种基于家系的使用 DeepVariant 的变异calling 流程。

Bioinformatics. 2020 Jun 1;36(11):3549-3551. doi: 10.1093/bioinformatics/btaa116.

Lean and deep models for more accurate filtering of SNP and INDEL variant calls.用于更准确筛选 SNP 和 INDEL 变异体调用的精简且深入的模型。

Bioinformatics. 2020 Apr 1;36(7):2060-2067. doi: 10.1093/bioinformatics/btz901.

A multi-task convolutional deep neural network for variant calling in single molecule sequencing.一种用于单分子测序中变异调用的多任务卷积深度神经网络。

Nat Commun. 2019 Mar 1;10(1):998. doi: 10.1038/s41467-019-09025-z.

DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network.DeepSV：使用深度卷积神经网络从高通量测序数据中准确调用基因组缺失

BMC Bioinformatics. 2019 Dec 12;20(1):665. doi: 10.1186/s12859-019-3299-y.

Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.利用基因型阵列数据比较多样本和单样本变异检测结果，并改进来自深度覆盖全基因组测序数据的变异检测集。

Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786.

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.整合人类序列数据集提供了一个基准 SNP 和 indel 基因型调用资源。

Nat Biotechnol. 2014 Mar;32(3):246-51. doi: 10.1038/nbt.2835. Epub 2014 Feb 16.

Calling known variants and identifying new variants while rapidly aligning sequence data.在快速对齐序列数据的同时，调用已知变异体并识别新变异体。

J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.

HELLO: improved neural network architectures and methodologies for small variant calling.你好：用于小型变异调用的改进神经网络架构和方法。

BMC Bioinformatics. 2021 Aug 14;22(1):404. doi: 10.1186/s12859-021-04311-4.

Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data.基于高通量测序数据对家系进行联合变异和新生突变鉴定。

J Comput Biol. 2014 Jun;21(6):405-19. doi: 10.1089/cmb.2014.0029.

引用本文的文献

Performance comparison of germline variant calling tools in sporadic disease cohorts.散发性疾病队列中种系变异检测工具的性能比较

Mol Genet Genomics. 2025 Sep 6;300(1):90. doi: 10.1007/s00438-025-02292-0.

Finding easy regions for short-read variant calling from pangenome data.从泛基因组数据中寻找易于进行短读变异检测的区域。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf103.

Artificial Intelligence and Chromothripsis.人工智能与染色体碎裂

Methods Mol Biol. 2025;2968:281-289. doi: 10.1007/978-1-0716-4750-9_16.

Identification of a novel POU4F3 frameshift variant in a Chinese family with autosomal dominant hearing loss.在中国一个常染色体显性遗传性听力损失家系中鉴定出一种新型POU4F3移码变异体。

Eur J Med Res. 2025 Aug 27;30(1):816. doi: 10.1186/s40001-025-03078-1.

Learning a refinement model for variant analysis in non-human primate genomes.学习用于非人灵长类动物基因组变异分析的优化模型。

BMC Genomics. 2025 Aug 25;26(1):775. doi: 10.1186/s12864-025-11921-2.

Increasing pathogenic germline variant diagnosis rates in precision medicine: current best practices and future opportunities.提高精准医学中致病种系变异的诊断率：当前最佳实践与未来机遇

Hum Genomics. 2025 Aug 22;19(1):97. doi: 10.1186/s40246-025-00811-z.

Beyond the genome: the role of functional markers in contemporary plant breeding.超越基因组：功能标记在当代植物育种中的作用

Front Plant Sci. 2025 Aug 5;16:1637299. doi: 10.3389/fpls.2025.1637299. eCollection 2025.

Evolutionary Consequences of Unusually Large Pericentric TE-rich Regions in the Genome of a Neotropical Fig Wasp.新热带区榕小蜂基因组中异常大的富含着丝粒转座元件区域的进化后果

Genome Biol Evol. 2025 Sep 2;17(9). doi: 10.1093/gbe/evaf158.

Indel calling from ONT sequencing data of family trios via sparse attention and 3D convolution.通过稀疏注意力和3D卷积从家系三联体的ONT测序数据中进行插入缺失检测。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf430.

Selection Signature Analysis of Whole-Genome Sequences to Identify Genome Differences Between Selected and Unselected Holstein Cattle.全基因组序列选择特征分析以鉴定选留与未选留荷斯坦奶牛之间的基因组差异

Animals (Basel). 2025 Jul 31;15(15):2247. doi: 10.3390/ani15152247.

本文引用的文献

A synthetic-diploid benchmark for accurate variant-calling evaluation.用于准确变异呼叫评估的合成二倍体基准。

Nat Methods. 2018 Aug;15(8):595-597. doi: 10.1038/s41592-018-0054-7. Epub 2018 Jul 16.

16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model.16GT：一种使用 16 种基因型概率模型的快速、灵敏的变异 caller。

Gigascience. 2017 Jul 1;6(7):1-4. doi: 10.1093/gigascience/gix045.

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree.通过对一个包含17名成员的三代家系进行测序，经遗传继承验证的540万个定相人类变异的参考数据集。

Genome Res. 2017 Jan;27(1):157-164. doi: 10.1101/gr.210500.116. Epub 2016 Nov 30.

Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

Deep learning in bioinformatics.生物信息学中的深度学习。

Brief Bioinform. 2017 Sep 1;18(5):851-869. doi: 10.1093/bib/bbw068.

Extensive sequencing of seven human genomes to characterize benchmark reference materials.对七个人类基因组进行广泛测序以表征基准参考材料。

Sci Data. 2016 Jun 7;3:160025. doi: 10.1038/sdata.2016.25.

Coming of age: ten years of next-generation sequencing technologies.成年：下一代测序技术的十年

Nat Rev Genet. 2016 May 17;17(6):333-51. doi: 10.1038/nrg.2016.49.

Medical implications of technical accuracy in genome sequencing.基因组测序技术准确性的医学意义。

Genome Med. 2016 Mar 2;8(1):24. doi: 10.1186/s13073-016-0269-0.

Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用深度神经网络的通用 SNP 和小插入缺失变体调用器。

A universal SNP and small-indel variant caller using deep neural networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献