Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, National University Health System (NUHS), 1E Kent Ridge Road, 119228, Singapore.
Eur J Hum Genet. 2019 Sep;27(9):1389-1397. doi: 10.1038/s41431-019-0412-7. Epub 2019 May 3.
Selection and prioritization of phenotype-centric variants remains a challenging part of variant analysis and interpretation in clinical exome sequencing. Phenotype-driven shortlisting of patient-specific gene lists can avoid missed diagnosis. Here, we analyzed the relevance of using primary Human Phenotype Ontology identifiers (HPO IDs) in prioritizing Mendelian disease genes across 30 in-house, 10 previously reported, and 10 recently published cases using three popular web-based gene prioritization tools (OMIMExplorer, VarElect & Phenolyzer). We assessed partial HPO-based gene prioritization using randomly chosen and top 10%, 30%, and 50% HPO IDs based on information content and found high variance within rank ratios across the former vs the latter. This signified that randomly selected less-specific HPO IDs for a given disease phenotype performed poorly by ranking probe gene farther away from the top rank. In contrast, the use of top 10%, 30%, and 50% HPO IDs individually could rank the probe gene among the top 1% in the ranked list of genes that was equivalent to the results when the full list of HPO IDs were used. Hence, we conclude that use of just the top 10% of HPO IDs chosen based on information content is sufficient for ranking the probe gene at top position. Our findings provide practical guidance for utilizing structured phenotype semantics and web-based gene-ranking tools to aid in identifying known as well unknown candidate gene associations in Mendelian disorders.
在临床外显子组测序中,选择和优先考虑表型中心变体仍然是变体分析和解释的一个具有挑战性的部分。通过表型驱动的患者特异性基因列表的筛选,可以避免漏诊。在这里,我们使用三种流行的基于网络的基因优先级工具(OMIMExplorer、VarElect 和 Phenolyzer),分析了在 30 个内部、10 个以前报道的和 10 个最近发表的病例中使用主要人类表型本体标识符(HPO ID)优先考虑孟德尔疾病基因的相关性。我们评估了基于部分 HPO 的基因优先级排序,使用随机选择和前 10%、30%和 50%的 HPO ID,基于信息含量,发现前一种方法与后一种方法的排名比值之间存在很高的差异。这表明,对于给定的疾病表型,随机选择的信息量较少的特定 HPO ID 在将探针基因排在前几位时表现不佳。相比之下,单独使用前 10%、30%和 50%的 HPO ID 可以将探针基因排在基因排名列表的前 1%,这与使用完整的 HPO ID 列表时的结果相当。因此,我们得出结论,仅使用基于信息内容选择的前 10%的 HPO ID 就足以将探针基因排在前位。我们的研究结果为利用结构化表型语义和基于网络的基因排序工具提供了实用的指导,以帮助识别孟德尔疾病中已知和未知的候选基因关联。