DANN：一种用于注释基因变异致病性的深度学习方法。

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

作者信息

Quang Daniel, Chen Yifei, Xie Xiaohui

机构信息

Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.

Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2015 Mar 1;31(5):761-3. doi: 10.1093/bioinformatics/btu703. Epub 2014 Oct 22.

DOI:10.1093/bioinformatics/btu703

PMID:25338716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4341060/

Abstract

UNLABELLED

Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology.

AVAILABILITY AND IMPLEMENTATION

All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/.

摘要

未标注

为了识别致病变异而对基因变异（尤其是非编码变异）进行注释仍然是一项挑战。综合注释依赖缺失（CADD）是一种旨在对编码和非编码变异进行注释的算法，并且已被证明优于其他注释算法。CADD训练一个线性核支持向量机（SVM）来区分进化衍生的、可能良性的等位基因与模拟的、可能有害的变异。然而，支持向量机无法捕捉特征之间的非线性关系，这可能会限制性能。为了解决这个问题，我们开发了DANN。DANN使用与CADD相同的特征集和训练数据来训练一个深度神经网络（DNN）。深度神经网络可以捕捉特征之间的非线性关系，并且比支持向量机更适合处理具有大量样本和特征的问题。我们利用与统一计算设备架构兼容的图形处理单元以及诸如随机失活和动量训练等深度学习技术来加速深度神经网络的训练。与CADD的支持向量机方法相比，DANN在错误率上实现了约19%的相对降低，在曲线下面积（AUC）指标上实现了约14%的相对增加。

可用性和实现方式

所有数据和源代码可在https://cbcl.ics.uci.edu/public_data/DANN/获取。

相似文献

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

Bioinformatics. 2015 Mar 1;31(5):761-3. doi: 10.1093/bioinformatics/btu703. Epub 2014 Oct 22.

A general framework for estimating the relative pathogenicity of human genetic variants.

Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.

The deep arbitrary polynomial chaos neural network or how Deep Artificial Neural Networks could benefit from data-driven homogeneous chaos theory.

Neural Netw. 2023 Sep;166:85-104. doi: 10.1016/j.neunet.2023.06.036. Epub 2023 Jul 10.

CADD: predicting the deleteriousness of variants throughout the human genome.

Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894. doi: 10.1093/nar/gky1016.

AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU.

BMC Bioinformatics. 2019 Oct 7;20(1):488. doi: 10.1186/s12859-019-3049-1.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

A comparative study on feature selection for a risk prediction model for colorectal cancer.

Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.

A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus.

PLoS One. 2020 Nov 5;15(11):e0242028. doi: 10.1371/journal.pone.0242028. eCollection 2020.

CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.

Nucleic Acids Res. 2024 Jan 5;52(D1):D1143-D1154. doi: 10.1093/nar/gkad989.

Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning.

J Proteome Res. 2016 Jun 3;15(6):1747-53. doi: 10.1021/acs.jproteome.5b00883. Epub 2016 May 9.

引用本文的文献

Lenticulostriate vasculopathy in newborns: whole genome sequencing data analysis.

Front Pediatr. 2025 Aug 14;13:1531086. doi: 10.3389/fped.2025.1531086. eCollection 2025.

Expanding the Phenotypic Spectrum of SPG4: Autism Spectrum Disorder in Early-Onset and Complex SPAST-HSP and Case Study.

Genes (Basel). 2025 Aug 18;16(8):970. doi: 10.3390/genes16080970.

Whole-miRNome sequencing: a panel for the targeted sequencing of all human miRNA genes.

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf812.

PUMA-induced apoptosis drives bone marrow failure and genomic instability in telomerase-deficient mice.

Cell Death Differ. 2025 Aug 19. doi: 10.1038/s41418-025-01557-w.

Tokenization and deep learning architectures in genomics: A comprehensive review.

Comput Struct Biotechnol J. 2025 Jul 28;27:3547-3555. doi: 10.1016/j.csbj.2025.07.038. eCollection 2025.

An exploration of testing genetic associations using goodness-of-fit statistics based on deep ReLU neural networks.

Front Syst Biol. 2024 Nov 18;4:1460369. doi: 10.3389/fsysb.2024.1460369. eCollection 2024.

Prediction of human pathogenic start loss variants based on self-supervised contrastive learning.

BMC Biol. 2025 Aug 8;23(1):250. doi: 10.1186/s12915-025-02348-y.

Clinical, laboratory and molecular features of glycogen storage disease type 1a and 1b patients from Turkey: novel mutations and phenotypes.

Eur J Pediatr. 2025 Aug 9;184(9):540. doi: 10.1007/s00431-025-06371-7.

Tubulin tyrosine ligase variant perturbs microtubule tyrosination, causing hypertrophy in patient-specific and CRISPR gene-edited iPSC-cardiomyocytes.

JCI Insight. 2025 Aug 8;10(15). doi: 10.1172/jci.insight.187942.

varCADD: large sets of standing genetic variation enable genome-wide pathogenicity prediction.

Genome Med. 2025 Aug 4;17(1):84. doi: 10.1186/s13073-025-01517-6.

本文引用的文献

A general framework for estimating the relative pathogenicity of human genetic variants.

Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.

Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.

Nature. 2013 Jan 10;493(7431):216-20. doi: 10.1038/nature11690. Epub 2012 Nov 28.

One-stop shop for disease genes.

Nature. 2012 Nov 8;491(7423):171. doi: 10.1038/491171a.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DANN：一种用于注释基因变异致病性的深度学习方法。

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

作者信息

Quang Daniel, Chen Yifei, Xie Xiaohui

机构信息

Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2015 Mar 1;31(5):761-3. doi: 10.1093/bioinformatics/btu703. Epub 2014 Oct 22.

DOI:10.1093/bioinformatics/btu703

PMID:25338716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4341060/

Abstract

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/.

摘要

未标注

可用性和实现方式

所有数据和源代码可在https://cbcl.ics.uci.edu/public_data/DANN/获取。

DANN：一种用于注释基因变异致病性的深度学习方法。

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

未标注

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

DANN：一种用于注释基因变异致病性的深度学习方法。

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

未标注

可用性和实现方式