在本体丰富的知识图中对罕见病进行聚类。

Clustering rare diseases within an ontology-enriched knowledge graph.

机构信息

Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States.

Chief Technology Office, Booz Allen Hamilton, Bethesda, MD, United States.

出版信息

J Am Med Inform Assoc. 2023 Dec 22;31(1):154-164. doi: 10.1093/jamia/ocad186.

DOI:10.1093/jamia/ocad186

PMID:37759342

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10746319/

Abstract

OBJECTIVE

Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing. Toward that aim, we utilized an integrative knowledge graph to construct clusters of rare diseases.

MATERIALS AND METHODS

Data on 3242 rare diseases were extracted from the National Center for Advancing Translational Science Genetic and Rare Diseases Information center internal data resources. The rare disease data enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data, and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were trained and clustered. We validated the disease clusters through semantic similarity and feature enrichment analysis.

RESULTS

Thirty-seven disease clusters were created with a mean size of 87 diseases. We validate the clusters quantitatively via semantic similarity based on the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters are highly related.

DISCUSSION

We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and drugs are enumerated for follow-up efforts.

CONCLUSION

We lay out a method for clustering rare diseases using graph node embeddings. We develop an easy-to-maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems.

摘要

目的

确定具有共同病因和病理生理学方面的罕见疾病集，可能实现药物再利用。为此，我们利用综合知识图谱构建罕见疾病簇。

材料和方法

从国家转化科学推进中心遗传和罕见疾病信息中心内部数据资源中提取了 3242 种罕见疾病的数据。利用包括基因和表型本体、生物途径数据和小分子-靶标活性数据在内的其他生物医学数据丰富罕见疾病数据，以创建知识图谱 (KG)。训练节点嵌入并对其进行聚类。通过语义相似性和特征富集分析验证疾病簇。

结果

创建了 37 个疾病簇，平均大小为 87 种疾病。我们通过基于孤儿罕见病本体的语义相似性对簇进行了定量验证。此外，对簇进行了相关基因的富集分析，结果表明簇内的富集基因高度相关。

讨论

我们证明节点嵌入是在异构 KG 中对疾病进行聚类的有效方法。在簇内发现了语义相似的疾病和相关的富集基因。枚举了疾病簇与药物之间的联系，以便后续进行研究。

结论

我们提出了一种使用图节点嵌入对罕见疾病进行聚类的方法。我们开发了一个易于维护的管道，当出现罕见疾病的新数据时，可以进行更新。嵌入本身可以与其他表示学习方法结合使用，例如药物，以解决其他预测建模问题。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在本体丰富的知识图中对罕见病进行聚类。

Clustering rare diseases within an ontology-enriched knowledge graph.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料和方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

在本体丰富的知识图中对罕见病进行聚类。

Clustering rare diseases within an ontology-enriched knowledge graph.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料和方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献