Zucca S, Nicora G, De Paoli F, Carta M G, Bellazzi R, Magni P, Rizzo E, Limongelli I
enGenome Srl, 27100, Pavia, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
Hum Genet. 2025 Mar;144(2-3):159-171. doi: 10.1007/s00439-023-02638-x. Epub 2024 Mar 23.
Identifying disease-causing variants in Rare Disease patients' genome is a challenging problem. To accomplish this task, we describe a machine learning framework, that we called "Suggested Diagnosis", whose aim is to prioritize genetic variants in an exome/genome based on the probability of being disease-causing. To do so, our method leverages standard guidelines for germline variant interpretation as defined by the American College of Human Genomics (ACMG) and the Association for Molecular Pathology (AMP), inheritance information, phenotypic similarity, and variant quality. Starting from (1) the VCF file containing proband's variants, (2) the list of proband's phenotypes encoded in Human Phenotype Ontology terms, and optionally (3) the information about family members (if available), the "Suggested Diagnosis" ranks all the variants according to their machine learning prediction. This method significantly reduces the number of variants that need to be evaluated by geneticists by pinpointing causative variants in the very first positions of the prioritized list. Most importantly, our approach proved to be among the top performers within the CAGI6 Rare Genome Project Challenge, where it was able to rank the true causative variant among the first positions and, uniquely among all the challenge participants, increased the diagnostic yield of 12.5% by solving 2 undiagnosed cases.
识别罕见病患者基因组中的致病变异是一个具有挑战性的问题。为完成这项任务,我们描述了一个机器学习框架,我们称之为“建议诊断”,其目的是根据致病可能性对全外显子组/基因组中的基因变异进行优先级排序。为此,我们的方法利用了美国人类基因组学学院(ACMG)和分子病理学协会(AMP)定义的种系变异解释标准指南、遗传信息、表型相似性和变异质量。从(1)包含先证者变异的VCF文件、(2)用人类表型本体术语编码的先证者表型列表,以及可选的(3)家庭成员信息(如果可用)开始,“建议诊断”根据机器学习预测对所有变异进行排序。这种方法通过在优先级列表的首位精确找出致病变异,显著减少了遗传学家需要评估的变异数量。最重要的是,我们的方法在CAGI6罕见基因组计划挑战赛中被证明是表现最佳的方法之一,它能够将真正的致病变异排在首位,并且在所有挑战赛参与者中独树一帜,通过解决2例未确诊病例,将诊断率提高了12.5%。