Suppr超能文献

多米诺骨牌:利用机器学习预测与显性疾病相关的基因

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders.

作者信息

Quinodoz Mathieu, Royer-Bertrand Beryl, Cisarova Katarina, Di Gioia Silvio Alessandro, Superti-Furga Andrea, Rivolta Carlo

机构信息

Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland.

Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland; Division of Genetic Medicine, Lausanne University Hospital (CHUV), 1011 Lausanne, Switzerland.

出版信息

Am J Hum Genet. 2017 Oct 5;101(4):623-629. doi: 10.1016/j.ajhg.2017.09.001.

Abstract

In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a 400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO's iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.

摘要

与双等位基因遗传的隐性疾病不同,孟德尔疾病显性(单等位基因)突变的鉴定更为困难,因为存在大量良性杂合变异,这些变异会产生大量背景噪声(通常比例高达400:1)。为了减少下一代测序(NGS)筛查中假阳性的泛滥,我们开发了DOMINO工具,用于评估基因携带显性变化的可能性。与常用的致病性预测工具不同,DOMINO考虑的是基因的特征,而非变异的特征。它采用机器学习方法从广泛的特征(N = 432)中提取判别信息,包括:基因组数据、种内和种间保守性、基因表达、蛋白质-蛋白质相互作用、蛋白质结构等。DOMINO的迭代架构包括对985个具有明确孟德尔疾病遗传模式的基因进行训练,并进行反复交叉验证以优化其判别能力。当在99个新发现的具有致病突变的基因上进行验证时,该算法显示出优异的最终性能,曲线下面积(AUC)为0.92。此外,DOMINO对来自智力残疾或癫痫患者的真实NGS数据集进行无监督分析,能够正确识别已知基因并以非常高的置信度预测9个新的候选基因。总之,DOMINO是一个强大且可靠的工具,能够以高灵敏度和特异性推断候选基因的显性,使其成为处理病态人类基因组分析的任何NGS流程的有用补充。

相似文献

引用本文的文献

本文引用的文献

1
denovo-db: a compendium of human de novo variants.从头变异数据库:人类从头变异的汇编
Nucleic Acids Res. 2017 Jan 4;45(D1):D804-D811. doi: 10.1093/nar/gkw865. Epub 2016 Oct 5.
9
Exome Sequencing: Current and Future Perspectives.外显子组测序:现状与未来展望。
G3 (Bethesda). 2015 Jul 2;5(8):1543-50. doi: 10.1534/g3.115.018564.
10
Haploinsufficiency predictions without study bias.无研究偏差的单倍剂量不足预测
Nucleic Acids Res. 2015 Sep 3;43(15):e101. doi: 10.1093/nar/gkv474. Epub 2015 May 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验