Suppr超能文献

基于真实世界数据的罕见病及其表型的开放注释

OARD: Open annotations for rare diseases and their phenotypes based on real-world data.

机构信息

Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

出版信息

Am J Hum Genet. 2022 Sep 1;109(9):1591-1604. doi: 10.1016/j.ajhg.2022.08.002. Epub 2022 Aug 22.

Abstract

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.

摘要

罕见遗传病的诊断通常依赖于表型驱动的方法,这些方法依赖于基础注释知识库中罕见疾病表型的准确性和完整性。现有的知识库通常是通过在已发表的病例报告中发现的附加注释进行手动整理的。尽管具有潜力,但电子健康记录 (EHR) 等真实世界的数据尚未被充分利用来得出罕见疾病的注释。在这里,我们提出了罕见疾病的开放注释 (OARD),这是一个源自真实世界数据的资源,其中包含与罕见疾病相关表型的注释。该资源源自两个学术医疗机构的 EHR,其中包含超过 1000 万人,涵盖了广泛的年龄范围和不同的疾病亚组。通过利用本体映射和先进的自然语言处理 (NLP) 方法,OARD 可以自动且高效地从计费代码和实验室测试以及超过 1 亿条临床记录中提取罕见疾病和其表型特征的概念。OARD 得出的罕见疾病患病率与原始罕见疾病知识库中注释的患病率高度相关。通过进行关联分析,我们确定了超过 100 万对以前被人类注释遗漏的新的疾病-表型关联对,其中 >60% 通过对抽样对列表的手动审查被确认为真实关联。与手动整理的注释相比,OARD 完全由数据驱动,其管道可以在不同机构之间共享。通过支持对汇总统计数据(如术语频率和疾病-表型关联)进行隐私保护共享,它填补了一个重要的空白,以促进罕见病社区中的数据驱动型研究。

相似文献

引用本文的文献

1
Identifying Phenotypes for Earlier Diagnosis of Rare Diseases.确定用于罕见病早期诊断的表型。
Stud Health Technol Inform. 2025 May 15;327:123-127. doi: 10.3233/SHTI250286.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验