Suppr超能文献

多米诺骨牌:利用机器学习预测与显性疾病相关的基因

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders.

作者信息

Quinodoz Mathieu, Royer-Bertrand Beryl, Cisarova Katarina, Di Gioia Silvio Alessandro, Superti-Furga Andrea, Rivolta Carlo

机构信息

Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland.

Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland; Division of Genetic Medicine, Lausanne University Hospital (CHUV), 1011 Lausanne, Switzerland.

出版信息

Am J Hum Genet. 2017 Oct 5;101(4):623-629. doi: 10.1016/j.ajhg.2017.09.001.

Abstract

In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a 400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO's iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.

摘要

与双等位基因遗传的隐性疾病不同,孟德尔疾病显性(单等位基因)突变的鉴定更为困难,因为存在大量良性杂合变异,这些变异会产生大量背景噪声(通常比例高达400:1)。为了减少下一代测序(NGS)筛查中假阳性的泛滥,我们开发了DOMINO工具,用于评估基因携带显性变化的可能性。与常用的致病性预测工具不同,DOMINO考虑的是基因的特征,而非变异的特征。它采用机器学习方法从广泛的特征(N = 432)中提取判别信息,包括:基因组数据、种内和种间保守性、基因表达、蛋白质-蛋白质相互作用、蛋白质结构等。DOMINO的迭代架构包括对985个具有明确孟德尔疾病遗传模式的基因进行训练,并进行反复交叉验证以优化其判别能力。当在99个新发现的具有致病突变的基因上进行验证时,该算法显示出优异的最终性能,曲线下面积(AUC)为0.92。此外,DOMINO对来自智力残疾或癫痫患者的真实NGS数据集进行无监督分析,能够正确识别已知基因并以非常高的置信度预测9个新的候选基因。总之,DOMINO是一个强大且可靠的工具,能够以高灵敏度和特异性推断候选基因的显性,使其成为处理病态人类基因组分析的任何NGS流程的有用补充。

相似文献

引用本文的文献

本文引用的文献

1
denovo-db: a compendium of human de novo variants.从头变异数据库:人类从头变异的汇编
Nucleic Acids Res. 2017 Jan 4;45(D1):D804-D811. doi: 10.1093/nar/gkw865. Epub 2016 Oct 5.
9
Exome Sequencing: Current and Future Perspectives.外显子组测序:现状与未来展望。
G3 (Bethesda). 2015 Jul 2;5(8):1543-50. doi: 10.1534/g3.115.018564.
10
Haploinsufficiency predictions without study bias.无研究偏差的单倍剂量不足预测
Nucleic Acids Res. 2015 Sep 3;43(15):e101. doi: 10.1093/nar/gkv474. Epub 2015 May 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验