School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX.
Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX.
AMIA Annu Symp Proc. 2023 Apr 29;2022:785-794. eCollection 2022.
Auditing the Human Phenotype Ontology (HPO) is necessary to provide accurate terminology for its use in clinical research. We investigate an approach leveraging the lexical features of concepts in HPO to identify missing IS-A relations among HPO concepts. We first model the names of HPO concepts as sets of words in lower case. Then, we generate two types of concept-pairs which have at least a single common word: (1) Linked concept-pairs generated from concept-pairs having an IS-A relation; (2) Unlinked concept-pairs generated from concept-pairs without an IS- A relation. Concept-pairs generate Derived Term Pairs (DTPs) emphasizing unique lexical information of each concept. If a linked concept-pair and an unlinked concept-pair generate the same DTP, then we suggest a potential missing IS-A relation among the unlinked concept-pair. Applying our approach to the 2022-02-14 release of HPO, we uncovered 2,516 potential missing IS-A relations in HPO. We validated 59 missing IS-A relations leveraging the Unified Medical Language System (UMLS) by mapping the concept-pair to UMLS concepts and verifying whether UMLS records an IS-A relation between the pair of concepts.
审核人类表型本体(HPO)对于在临床研究中使用它提供准确的术语是必要的。我们研究了一种利用 HPO 中概念的词汇特征来识别 HPO 概念之间缺失的 IS-A 关系的方法。我们首先将 HPO 概念的名称建模为小写单词的集合。然后,我们生成了两种类型的至少有一个共同单词的概念对:(1)从具有 IS-A 关系的概念对生成的链接概念对;(2)从没有 IS-A 关系的概念对生成的非链接概念对。概念对生成派生术语对(DTP),强调每个概念的独特词汇信息。如果链接概念对和非链接概念对生成相同的 DTP,则我们建议在非链接概念对之间存在潜在的缺失 IS-A 关系。将我们的方法应用于 2022-02-14 发布的 HPO,我们在 HPO 中发现了 2516 个潜在的缺失 IS-A 关系。我们利用统一医学语言系统(UMLS)验证了 59 个缺失的 IS-A 关系,通过将概念对映射到 UMLS 概念并验证 UMLS 是否记录了这对概念之间的 IS-A 关系,从而验证了这 59 个缺失的 IS-A 关系。