多米诺骨牌：利用机器学习预测与显性疾病相关的基因

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders.

作者信息

Quinodoz Mathieu, Royer-Bertrand Beryl, Cisarova Katarina, Di Gioia Silvio Alessandro, Superti-Furga Andrea, Rivolta Carlo

机构信息

Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland.

Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland; Division of Genetic Medicine, Lausanne University Hospital (CHUV), 1011 Lausanne, Switzerland.

出版信息

Am J Hum Genet. 2017 Oct 5;101(4):623-629. doi: 10.1016/j.ajhg.2017.09.001.

DOI:10.1016/j.ajhg.2017.09.001

PMID:28985496

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5630195/

Abstract

In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a 400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO's iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.

摘要

与双等位基因遗传的隐性疾病不同，孟德尔疾病显性（单等位基因）突变的鉴定更为困难，因为存在大量良性杂合变异，这些变异会产生大量背景噪声（通常比例高达400:1）。为了减少下一代测序（NGS）筛查中假阳性的泛滥，我们开发了DOMINO工具，用于评估基因携带显性变化的可能性。与常用的致病性预测工具不同，DOMINO考虑的是基因的特征，而非变异的特征。它采用机器学习方法从广泛的特征（N = 432）中提取判别信息，包括：基因组数据、种内和种间保守性、基因表达、蛋白质-蛋白质相互作用、蛋白质结构等。DOMINO的迭代架构包括对985个具有明确孟德尔疾病遗传模式的基因进行训练，并进行反复交叉验证以优化其判别能力。当在99个新发现的具有致病突变的基因上进行验证时，该算法显示出优异的最终性能，曲线下面积（AUC）为0.92。此外，DOMINO对来自智力残疾或癫痫患者的真实NGS数据集进行无监督分析，能够正确识别已知基因并以非常高的置信度预测9个新的候选基因。总之，DOMINO是一个强大且可靠的工具，能够以高灵敏度和特异性推断候选基因的显性，使其成为处理病态人类基因组分析的任何NGS流程的有用补充。

相似文献

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders.多米诺骨牌：利用机器学习预测与显性疾病相关的基因

Am J Hum Genet. 2017 Oct 5;101(4):623-629. doi: 10.1016/j.ajhg.2017.09.001.

Next-generation sequencing using a pre-designed gene panel for the molecular diagnosis of congenital disorders in pediatric patients.使用预先设计的基因检测板进行下一代测序，用于儿科患者先天性疾病的分子诊断。

Hum Genomics. 2015 Dec 14;9:33. doi: 10.1186/s40246-015-0055-x.

Exome sequencing covers >98% of mutations identified on targeted next generation sequencing panels.外显子组测序涵盖了在靶向新一代测序面板上鉴定出的超过98%的突变。

PLoS One. 2017 Feb 2;12(2):e0170843. doi: 10.1371/journal.pone.0170843. eCollection 2017.

SomaticSeq: An Ensemble and Machine Learning Method to Detect Somatic Mutations.SomaticSeq：一种用于检测体细胞突变的集成和机器学习方法。

Methods Mol Biol. 2020;2120:47-70. doi: 10.1007/978-1-0716-0327-7_4.

Allele frequency analysis of variants reported to cause autosomal dominant inherited retinal diseases question the involvement of 19% of genes and 10% of reported pathogenic variants.对报道引起常染色体显性遗传性视网膜疾病的变异的等位基因频率分析质疑 19%的基因和 10%的报道致病性变异的参与。

J Med Genet. 2019 Aug;56(8):536-542. doi: 10.1136/jmedgenet-2018-105971. Epub 2019 Mar 25.

Gene pathogenicity prediction of Mendelian diseases via the random forest algorithm.基于随机森林算法的孟德尔疾病基因致病性预测。

Hum Genet. 2019 Jun;138(6):673-679. doi: 10.1007/s00439-019-02021-9. Epub 2019 May 8.

Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives.利用下一代测序技术破解人类疾病密码：应用、挑战与展望

Biomed Res Int. 2015;2015:161648. doi: 10.1155/2015/161648. Epub 2015 Nov 19.

Detection and Quantification of Mosaic Mutations in Disease Genes by Next-Generation Sequencing.通过下一代测序技术检测和定量疾病基因中的镶嵌突变

J Mol Diagn. 2016 May;18(3):446-453. doi: 10.1016/j.jmoldx.2016.01.002. Epub 2016 Mar 2.

VarAFT: a variant annotation and filtration system for human next generation sequencing data.VarAFT：一种用于人类下一代测序数据的变异注释和过滤系统。

Nucleic Acids Res. 2018 Jul 2;46(W1):W545-W553. doi: 10.1093/nar/gky471.

Identifying disease-causing mutations in genomes of single patients by computational approaches.通过计算方法在单个患者的基因组中识别致病突变。

Hum Genet. 2020 Jun;139(6-7):769-776. doi: 10.1007/s00439-020-02179-7. Epub 2020 May 13.

引用本文的文献

Heterozygous variants in affect hearing, vision, cardiac, and immune function.[基因名称]中的杂合变异会影响听力、视力、心脏和免疫功能。（注：原文中“in”后面缺少具体基因名称，需补充完整才能准确翻译）

Elife. 2025 Aug 27;13:RP95887. doi: 10.7554/eLife.95887.

as Candidate Gene for Neurodevelopmental Disorders: Identification of a Pathogenic De Novo Frameshift Variant.作为神经发育障碍的候选基因：一种致病性新生移码变异的鉴定

Int J Mol Sci. 2025 Aug 5;26(15):7586. doi: 10.3390/ijms26157586.

Proteome-wide prediction of the mode of inheritance and molecular mechanisms underlying genetic diseases using structural interactomics.利用结构相互作用组学对遗传疾病的遗传模式和潜在分子机制进行全蛋白质组预测。

iScience. 2025 Jun 4;28(7):112812. doi: 10.1016/j.isci.2025.112812. eCollection 2025 Jul 18.

An integrative scoring approach for prioritization of rare autism spectrum disorder candidate variants from whole exome sequencing data.一种用于从全外显子组测序数据中对罕见自闭症谱系障碍候选变异进行优先级排序的综合评分方法。

Sci Rep. 2025 Apr 15;15(1):13024. doi: 10.1038/s41598-025-96063-x.

Exploring the Role of , a New Potential Gene Involved in Borderline Intellectual Functioning, Psychological and Metabolic Disorders.探索一种新的潜在基因在边缘智力功能、心理和代谢紊乱中的作用。（原文中“of”后面缺少具体内容）

Genes (Basel). 2024 Dec 23;15(12):1655. doi: 10.3390/genes15121655.

Ensemble and consensus approaches to prediction of recessive inheritance for missense variants in human disease.用于预测人类疾病错义变异隐性遗传的集成和共识方法。

Cell Rep Methods. 2024 Dec 16;4(12):100914. doi: 10.1016/j.crmeth.2024.100914. Epub 2024 Dec 9.

The conserved genetic program of male germ cells uncovers ancient regulators of human spermatogenesis.雄性生殖细胞的保守遗传程序揭示了人类精子发生的古老调控因子。

Elife. 2024 Oct 10;13:RP95774. doi: 10.7554/eLife.95774.

Bioinformatic Evaluation of Genetic Variant: Implications for Neurodevelopmental and Psychiatric Symptoms.遗传变异的生物信息学评估：对神经发育和精神症状的影响。

Genes (Basel). 2024 Aug 11;15(8):1056. doi: 10.3390/genes15081056.

A de novo ARIH2 gene mutation was detected in a patient with autism spectrum disorders and intellectual disability.在一名自闭症谱系障碍和智力残疾患者中检测到一个新的 ARIH2 基因突变。

Sci Rep. 2024 Jul 9;14(1):15848. doi: 10.1038/s41598-024-66475-2.

Artificial Intelligence in Point-of-Care Biosensing: Challenges and Opportunities.即时护理生物传感中的人工智能：挑战与机遇

Diagnostics (Basel). 2024 May 25;14(11):1100. doi: 10.3390/diagnostics14111100.

本文引用的文献

denovo-db: a compendium of human de novo variants.从头变异数据库：人类从头变异的汇编

Nucleic Acids Res. 2017 Jan 4;45(D1):D804-D811. doi: 10.1093/nar/gkw865. Epub 2016 Oct 5.

Stabilizing mutations of KLHL24 ubiquitin ligase cause loss of keratin 14 and human skin fragility.KLHL24 泛素连接酶的稳定突变导致角蛋白 14 的缺失和人类皮肤脆弱。

Nat Genet. 2016 Dec;48(12):1508-1516. doi: 10.1038/ng.3701. Epub 2016 Oct 31.

Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability.对 2104 个三核苷酸重复扩展家族的荟萃分析为智力障碍的 10 个新基因提供了支持。

Nat Neurosci. 2016 Sep;19(9):1194-6. doi: 10.1038/nn.4352. Epub 2016 Aug 1.

Ataxia-Pancytopenia Syndrome Is Caused by Missense Mutations in SAMD9L.共济失调-全血细胞减少综合征由SAMD9L基因的错义突变引起。

Am J Hum Genet. 2016 Jun 2;98(6):1146-1158. doi: 10.1016/j.ajhg.2016.04.009.

Autosomal-Dominant Corneal Endothelial Dystrophies CHED1 and PPCD1 Are Allelic Disorders Caused by Non-coding Mutations in the Promoter of OVOL2.常染色体显性遗传性角膜内皮营养不良CHED1和PPCD1是由OVOL2启动子中的非编码突变引起的等位基因疾病。

Am J Hum Genet. 2016 Jan 7;98(1):75-89. doi: 10.1016/j.ajhg.2015.11.018. Epub 2015 Dec 31.

Nosology and classification of genetic skeletal disorders: 2015 revision.遗传性骨骼疾病的疾病分类学与分类：2015年修订版

Am J Med Genet A. 2015 Dec;167A(12):2869-92. doi: 10.1002/ajmg.a.37365. Epub 2015 Sep 23.

The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities.孟德尔表型的遗传基础：发现、挑战与机遇

Am J Hum Genet. 2015 Aug 6;97(2):199-215. doi: 10.1016/j.ajhg.2015.06.009. Epub 2015 Jul 9.

Exome Sequencing: Current and Future Perspectives.外显子组测序：现状与未来展望。

G3 (Bethesda). 2015 Jul 2;5(8):1543-50. doi: 10.1534/g3.115.018564.

Haploinsufficiency predictions without study bias.无研究偏差的单倍剂量不足预测

Nucleic Acids Res. 2015 Sep 3;43(15):e101. doi: 10.1093/nar/gkv474. Epub 2015 May 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验