McArthur Evonne, Bastarache Lisa, Capra John A
Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, USA.
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
JAMIA Open. 2023 Feb 28;6(1):ooad007. doi: 10.1093/jamiaopen/ooad007. eCollection 2023 Apr.
Enabling discovery across the spectrum of rare and common diseases requires the integration of biological knowledge with clinical data; however, differences in terminologies present a major barrier. For example, the Human Phenotype Ontology (HPO) is the primary vocabulary for describing features of rare diseases, while most clinical encounters use International Classification of Diseases (ICD) billing codes. ICD codes are further organized into clinically meaningful phenotypes via phecodes. Despite their prevalence, no robust phenome-wide disease mapping between HPO and phecodes/ICD exists. Here, we synthesize evidence using diverse sources and methods-including text matching, the National Library of Medicine's Unified Medical Language System (UMLS), Wikipedia, SORTA, and PheMap-to define a mapping between phecodes and HPO terms via 38 950 links. We evaluate the precision and recall for each domain of evidence, both individually and jointly. This flexibility permits users to tailor the HPO-phecode links for diverse applications along the spectrum of monogenic to polygenic diseases.
实现对罕见病和常见疾病全谱的发现需要将生物学知识与临床数据相结合;然而,术语差异是一个主要障碍。例如,人类表型本体(HPO)是描述罕见病特征的主要词汇表,而大多数临床诊疗使用国际疾病分类(ICD)计费代码。ICD代码通过phecode进一步组织成具有临床意义的表型。尽管它们很普遍,但在HPO与phecode/ICD之间不存在强大的全表型疾病映射。在这里,我们使用多种来源和方法——包括文本匹配、美国国立医学图书馆的统一医学语言系统(UMLS)、维基百科、SORTA和PheMap——综合证据,通过38950个链接定义phecode与HPO术语之间的映射。我们分别和联合评估每个证据领域的精确率和召回率。这种灵活性允许用户针对从单基因疾病到多基因疾病全谱的各种应用定制HPO-phecode链接。