Zhu Qian, Nguyen Dac-Trung, Alyea Gioconda, Hanson Karen, Sid Eric, Pariser Anne
Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States.
ICF International Inc, Rockville, MD, United States.
JMIR Med Inform. 2020 Oct 2;8(10):e18395. doi: 10.2196/18395.
Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization.
To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types.
We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes.
For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization.
We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.
尽管已经做出了许多努力来开发全面的疾病资源,以获取罕见病信息用于临床决策和教育,但目前尚无在多种资源中定义和协调罕见病的标准化方案。这导致了数据冗余和不一致,最终可能增加这些资源广泛应用时的困惑和难度。为克服此类障碍,我们报告了初步研究,以识别表现出相似临床表现的遗传和罕见病(GARD)之间的表型相似性,并支持进一步的数据协调。
为支持罕见病数据协调,我们旨在从面向疾病的综合知识图谱中系统地识别表型相似的GARD疾病,并确定它们的相似性类型。
我们通过两种方法以编程方式识别表型相似的GARD疾病:(1)通过比较GARD与其他罕见病资源之间的疾病映射来衡量疾病相似性,并纳入人工评估;(2)我们从疾病分类中推导同胞疾病中出现的临床表现,并根据其表型和基因型对识别出的相似疾病进行优先级排序。
对于疾病相似性比较,大约87%(341/392)识别出的表型相似疾病对得到了验证;其中80%(271/392)的疾病对基于相似性得分被准确识别为表型相似。评估结果显示出高精度(94%)和令人满意的质量(F值为86%)。通过从君主疾病本体论(MONDO)和孤儿病分类树中推导表型相似性,我们总共识别出360对至少有1个共享临床表型和基因的疾病对,这些疾病对被用于确定临床相关性的优先级。总共识别出662对表型相似的疾病对,并将应用于GARD数据协调。
我们通过疾病映射比较和从疾病分类系统推导表型相似性这两种方法,成功地在GARD疾病中识别出表型相似的罕见病。这些结果不仅将指导GARD数据协调以扩展转化科学研究,还将加速不同疾病资源和术语之间的数据透明度和一致性,有助于建立一个强大且最新的罕见病知识资源。