Quinodoz Mathieu, Royer-Bertrand Beryl, Cisarova Katarina, Di Gioia Silvio Alessandro, Superti-Furga Andrea, Rivolta Carlo
Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland.
Department of Computational Biology, Unit of Medical Genetics, University of Lausanne, 1011 Lausanne, Switzerland; Division of Genetic Medicine, Lausanne University Hospital (CHUV), 1011 Lausanne, Switzerland.
Am J Hum Genet. 2017 Oct 5;101(4):623-629. doi: 10.1016/j.ajhg.2017.09.001.
In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a 400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO's iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.
与双等位基因遗传的隐性疾病不同,孟德尔疾病显性(单等位基因)突变的鉴定更为困难,因为存在大量良性杂合变异,这些变异会产生大量背景噪声(通常比例高达400:1)。为了减少下一代测序(NGS)筛查中假阳性的泛滥,我们开发了DOMINO工具,用于评估基因携带显性变化的可能性。与常用的致病性预测工具不同,DOMINO考虑的是基因的特征,而非变异的特征。它采用机器学习方法从广泛的特征(N = 432)中提取判别信息,包括:基因组数据、种内和种间保守性、基因表达、蛋白质-蛋白质相互作用、蛋白质结构等。DOMINO的迭代架构包括对985个具有明确孟德尔疾病遗传模式的基因进行训练,并进行反复交叉验证以优化其判别能力。当在99个新发现的具有致病突变的基因上进行验证时,该算法显示出优异的最终性能,曲线下面积(AUC)为0.92。此外,DOMINO对来自智力残疾或癫痫患者的真实NGS数据集进行无监督分析,能够正确识别已知基因并以非常高的置信度预测9个新的候选基因。总之,DOMINO是一个强大且可靠的工具,能够以高灵敏度和特异性推断候选基因的显性,使其成为处理病态人类基因组分析的任何NGS流程的有用补充。