• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在本体丰富的知识图中对罕见病进行聚类。

Clustering rare diseases within an ontology-enriched knowledge graph.

机构信息

Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States.

Chief Technology Office, Booz Allen Hamilton, Bethesda, MD, United States.

出版信息

J Am Med Inform Assoc. 2023 Dec 22;31(1):154-164. doi: 10.1093/jamia/ocad186.

DOI:10.1093/jamia/ocad186
PMID:37759342
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10746319/
Abstract

OBJECTIVE

Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing. Toward that aim, we utilized an integrative knowledge graph to construct clusters of rare diseases.

MATERIALS AND METHODS

Data on 3242 rare diseases were extracted from the National Center for Advancing Translational Science Genetic and Rare Diseases Information center internal data resources. The rare disease data enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data, and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were trained and clustered. We validated the disease clusters through semantic similarity and feature enrichment analysis.

RESULTS

Thirty-seven disease clusters were created with a mean size of 87 diseases. We validate the clusters quantitatively via semantic similarity based on the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters are highly related.

DISCUSSION

We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and drugs are enumerated for follow-up efforts.

CONCLUSION

We lay out a method for clustering rare diseases using graph node embeddings. We develop an easy-to-maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems.

摘要

目的

确定具有共同病因和病理生理学方面的罕见疾病集,可能实现药物再利用。为此,我们利用综合知识图谱构建罕见疾病簇。

材料和方法

从国家转化科学推进中心遗传和罕见疾病信息中心内部数据资源中提取了 3242 种罕见疾病的数据。利用包括基因和表型本体、生物途径数据和小分子-靶标活性数据在内的其他生物医学数据丰富罕见疾病数据,以创建知识图谱 (KG)。训练节点嵌入并对其进行聚类。通过语义相似性和特征富集分析验证疾病簇。

结果

创建了 37 个疾病簇,平均大小为 87 种疾病。我们通过基于孤儿罕见病本体的语义相似性对簇进行了定量验证。此外,对簇进行了相关基因的富集分析,结果表明簇内的富集基因高度相关。

讨论

我们证明节点嵌入是在异构 KG 中对疾病进行聚类的有效方法。在簇内发现了语义相似的疾病和相关的富集基因。枚举了疾病簇与药物之间的联系,以便后续进行研究。

结论

我们提出了一种使用图节点嵌入对罕见疾病进行聚类的方法。我们开发了一个易于维护的管道,当出现罕见疾病的新数据时,可以进行更新。嵌入本身可以与其他表示学习方法结合使用,例如药物,以解决其他预测建模问题。

相似文献

1
Clustering rare diseases within an ontology-enriched knowledge graph.在本体丰富的知识图中对罕见病进行聚类。
J Am Med Inform Assoc. 2023 Dec 22;31(1):154-164. doi: 10.1093/jamia/ocad186.
2
Clustering rare diseases within an ontology-enriched knowledge graph.在富含本体的知识图谱中对罕见病进行聚类。
bioRxiv. 2023 Feb 16:2023.02.15.528673. doi: 10.1101/2023.02.15.528673.
3
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+:利用异构知识资源丰富人类表型本体的节点嵌入。
J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.
4
An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD).一个源自遗传和罕见疾病信息中心(GARD)的罕见病综合知识图谱。
J Biomed Semantics. 2020 Nov 12;11(1):13. doi: 10.1186/s13326-020-00232-y.
5
Multi-domain knowledge graph embeddings for gene-disease association prediction.多领域知识图谱嵌入在基因-疾病关联预测中的应用。
J Biomed Semantics. 2023 Aug 14;14(1):11. doi: 10.1186/s13326-023-00291-x.
6
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
7
Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis.通过大语言模型文本嵌入和本体语义分析揭示疫苗中不同的不良事件特征。
J Biomed Semantics. 2025 May 23;16(1):10. doi: 10.1186/s13326-025-00331-8.
8
Matching biomedical ontologies with GCN-based feature propagation.基于图卷积网络特征传播的生物医学本体匹配。
Math Biosci Eng. 2022 Jun 9;19(8):8479-8504. doi: 10.3934/mbe.2022394.
9
Task-driven knowledge graph filtering improves prioritizing drugs for repurposing.任务驱动的知识图过滤可改善药物再利用的优先级排序。
BMC Bioinformatics. 2022 Mar 4;23(1):84. doi: 10.1186/s12859-022-04608-y.
10
RDKG-115: Assisting drug repurposing and discovery for rare diseases by trimodal knowledge graph embedding.RDKG-115:通过三模态知识图嵌入辅助罕见病药物再利用和发现。
Comput Biol Med. 2023 Sep;164:107262. doi: 10.1016/j.compbiomed.2023.107262. Epub 2023 Jul 17.

引用本文的文献

1
Improving Biomedical Knowledge Graph Quality: A Community Approach.提升生物医学知识图谱质量:一种社区方法。
ArXiv. 2025 Aug 29:arXiv:2508.21774v1.
2
Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.通过对具有时间版本的生物本体进行聚类和分类,生成针对罕见病和未确诊疾病的假设。
PLoS One. 2024 Dec 26;19(12):e0309205. doi: 10.1371/journal.pone.0309205. eCollection 2024.
3
An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统:开发研究
JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.
4
Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond.Pheno-Ranker:用于比较存储在GA4GH标准及其他标准中的表型数据的工具包。
BMC Bioinformatics. 2024 Dec 4;25(1):373. doi: 10.1186/s12859-024-05993-2.
5
Knowledge graphs in psychiatric research: Potential applications and future perspectives.精神医学研究中的知识图谱:潜在应用与未来展望。
Acta Psychiatr Scand. 2025 Mar;151(3):180-191. doi: 10.1111/acps.13717. Epub 2024 Jun 17.
6
A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems.一种深度学习变压器模型预测大型电子健康系统中未诊断罕见病的高发生率。
medRxiv. 2023 Dec 24:2023.12.21.23300393. doi: 10.1101/2023.12.21.23300393.
7
Integrative rare disease biomedical profile based network supporting drug repurposing or repositioning, a case study of glioblastoma.基于综合罕见病生物医学特征的网络支持药物重定位或再定位,以胶质母细胞瘤为例。
Orphanet J Rare Dis. 2023 Sep 25;18(1):301. doi: 10.1186/s13023-023-02876-2.
8
Integrative Rare Disease Biomedical Profile based Network Supporting Drug Repurposing, a case study of Glioblastoma.基于网络的综合罕见病生物医学概况支持药物再利用,胶质母细胞瘤的案例研究
Res Sq. 2023 Apr 18:rs.3.rs-2809689. doi: 10.21203/rs.3.rs-2809689/v1.

本文引用的文献

1
Pharos 2023: an integrated resource for the understudied human proteome.Pharos 2023:一个针对人类蛋白质组中未被充分研究的部分的综合资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1405-D1416. doi: 10.1093/nar/gkac1033.
2
CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure.CoGO:一种基于基因网络和本体结构的对比学习框架,用于预测疾病相似性。
Bioinformatics. 2022 Sep 15;38(18):4380-4386. doi: 10.1093/bioinformatics/btac520.
3
Genetic defects are common in myopathies with tubular aggregates.遗传性缺陷在管状聚集型肌病中很常见。
Ann Clin Transl Neurol. 2022 Jan;9(1):4-15. doi: 10.1002/acn3.51477. Epub 2021 Dec 15.
4
OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies.2021 年的 OBO 基金会:运用开放数据原则来评估本体论。
Database (Oxford). 2021 Oct 26;2021. doi: 10.1093/database/baab069.
5
The IDeaS initiative: pilot study to assess the impact of rare diseases on patients and healthcare systems.IDeas 计划:评估罕见病对患者和医疗体系影响的试点研究。
Orphanet J Rare Dis. 2021 Oct 22;16(1):429. doi: 10.1186/s13023-021-02061-3.
6
NCATS Inxight Drugs: a comprehensive and curated portal for translational research.NCATS Inxight Drugs:一个全面且经过精心策划的转化研究门户。
Nucleic Acids Res. 2022 Jan 7;50(D1):D1307-D1316. doi: 10.1093/nar/gkab918.
7
Gene Set Knowledge Discovery with Enrichr.基因集知识发现与 Enrichr
Curr Protoc. 2021 Mar;1(3):e90. doi: 10.1002/cpz1.90.
8
RDmap: a map for exploring rare diseases.RDmap:一个用于探索罕见病的图谱。
Orphanet J Rare Dis. 2021 Feb 25;16(1):101. doi: 10.1186/s13023-021-01741-4.
9
A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions.计算药物重新定位综述:策略、方法、机遇、挑战及方向
J Cheminform. 2020 Jul 22;12(1):46. doi: 10.1186/s13321-020-00450-7.
10
The Gene Ontology resource: enriching a GOld mine.基因本体论资源:丰富一个 GOld 矿。
Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.