Suppr超能文献

基于多头注意力机制的疾病-基因/蛋白质关联预测的跨模态嵌入集成器。

Cross-modal embedding integrator for disease-gene/protein association prediction using a multi-head attention mechanism.

机构信息

Education and Research Program for Future ICT Pioneers, Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea.

Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea.

出版信息

Pharmacol Res Perspect. 2024 Dec;12(6):e70034. doi: 10.1002/prp2.70034.

Abstract

Knowledge graphs, powerful tools that explicitly transfer knowledge to machines, have significantly advanced new knowledge inferences. Discovering unknown relationships between diseases and genes/proteins in biomedical knowledge graphs can lead to the identification of disease development mechanisms and new treatment targets. Generating high-quality representations of biomedical entities is essential for successfully predicting disease-gene/protein associations. We developed a computational model that predicts disease-gene/protein associations using the Precision Medicine Knowledge Graph, a biomedical knowledge graph. Embeddings of biomedical entities were generated using two different methods-a large language model (LLM) and the knowledge graph embedding (KGE) algorithm. The LLM utilizes information obtained from massive amounts of text data, whereas the KGE algorithm relies on graph structures. We developed a disease-gene/protein association prediction model, "Cross-Modal Embedding Integrator (CMEI)," by integrating embeddings from different modalities using a multi-head attention mechanism. The area under the receiver operating characteristic curve of CMEI was 0.9662 (± 0.0002) in predicting disease-gene/protein associations. In conclusion, we developed a computational model that effectively predicts disease-gene/protein associations. CMEI may contribute to the identification of disease development mechanisms and new treatment targets.

摘要

知识图谱是一种将知识明确地转移给机器的强大工具,它极大地推动了新知识的推理。在生物医学知识图谱中发现疾病与基因/蛋白质之间未知的关系,可以帮助我们识别疾病的发展机制和新的治疗靶点。生成高质量的生物医学实体表示对于成功预测疾病-基因/蛋白质关联至关重要。我们开发了一种使用精准医学知识图谱(一种生物医学知识图谱)预测疾病-基因/蛋白质关联的计算模型。使用两种不同的方法生成生物医学实体的嵌入表示:一种是大型语言模型(LLM),另一种是知识图谱嵌入(KGE)算法。LLM 利用从大量文本数据中获取的信息,而 KGE 算法则依赖于图结构。我们通过使用多头注意力机制将来自不同模态的嵌入集成在一起,开发了一种疾病-基因/蛋白质关联预测模型“跨模态嵌入集成器(CMEI)”。CMEI 在预测疾病-基因/蛋白质关联方面的接收者操作特征曲线下面积为 0.9662(±0.0002)。总之,我们开发了一种能够有效预测疾病-基因/蛋白质关联的计算模型。CMEI 可能有助于识别疾病的发展机制和新的治疗靶点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b14/11574662/bc3941f0d8f4/PRP2-12-e70034-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验