• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于异质生物医学实体表示学习的基因-疾病关联预测。

Heterogeneous biomedical entity representation learning for gene-disease association prediction.

机构信息

School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, UK.

School of Natural and Computing Science, University of Aberdeen King's College, Aberdeen, AB24 3FX, UK.

出版信息

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae380.

DOI:10.1093/bib/bbae380
PMID:39154194
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11330343/
Abstract

Understanding the genetic basis of disease is a fundamental aspect of medical research, as genes are the classic units of heredity and play a crucial role in biological function. Identifying associations between genes and diseases is critical for diagnosis, prevention, prognosis, and drug development. Genes that encode proteins with similar sequences are often implicated in related diseases, as proteins causing identical or similar diseases tend to show limited variation in their sequences. Predicting gene-disease association (GDA) requires time-consuming and expensive experiments on a large number of potential candidate genes. Although methods have been proposed to predict associations between genes and diseases using traditional machine learning algorithms and graph neural networks, these approaches struggle to capture the deep semantic information within the genes and diseases and are dependent on training data. To alleviate this issue, we propose a novel GDA prediction model named FusionGDA, which utilizes a pre-training phase with a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models. Multi-modal representations are generated by the fusion module, which includes rich semantic information about two heterogeneous biomedical entities: protein sequences and disease descriptions. Subsequently, the pooling aggregation strategy is adopted to compress the dimensions of the multi-modal representation. In addition, FusionGDA employs a pre-training phase leveraging a contrastive learning loss to extract potential gene and disease features by training on a large public GDA dataset. To rigorously evaluate the effectiveness of the FusionGDA model, we conduct comprehensive experiments on five datasets and compare our proposed model with five competitive baseline models on the DisGeNet-Eval dataset. Notably, our case study further demonstrates the ability of FusionGDA to discover hidden associations effectively. The complete code and datasets of our experiments are available at https://github.com/ZhaohanM/FusionGDA.

摘要

了解疾病的遗传基础是医学研究的一个基本方面,因为基因是经典的遗传单位,在生物功能中起着至关重要的作用。鉴定基因与疾病之间的关联对于诊断、预防、预后和药物开发至关重要。编码具有相似序列的蛋白质的基因通常与相关疾病有关,因为导致相同或相似疾病的蛋白质在其序列中往往表现出有限的变化。预测基因-疾病关联(GDA)需要在大量潜在候选基因上进行耗时且昂贵的实验。尽管已经提出了使用传统机器学习算法和图神经网络预测基因和疾病之间关联的方法,但这些方法难以捕捉基因和疾病内部的深层语义信息,并且依赖于训练数据。为了解决这个问题,我们提出了一种名为 FusionGDA 的新型 GDA 预测模型,该模型利用融合模块的预训练阶段来丰富由预训练语言模型编码的基因和疾病语义表示。融合模块生成多模态表示,其中包括两种异质生物医学实体的丰富语义信息:蛋白质序列和疾病描述。随后,采用池化聚合策略来压缩多模态表示的维度。此外,FusionGDA 还采用了预训练阶段,利用对比学习损失来通过在大型公共 GDA 数据集上进行训练来提取潜在的基因和疾病特征。为了严格评估 FusionGDA 模型的有效性,我们在五个数据集上进行了全面的实验,并在 DisGeNet-Eval 数据集上与五个竞争基线模型进行了比较。值得注意的是,我们的案例研究进一步证明了 FusionGDA 有效发现隐藏关联的能力。我们实验的完整代码和数据集可在 https://github.com/ZhaohanM/FusionGDA 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/52024f9a1929/bbae380f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/e92130cdc311/bbae380f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/c7b602685460/bbae380f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/44feff02b507/bbae380f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/cc9d2257bded/bbae380f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/7bbf8e2aef86/bbae380f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/52024f9a1929/bbae380f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/e92130cdc311/bbae380f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/c7b602685460/bbae380f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/44feff02b507/bbae380f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/cc9d2257bded/bbae380f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/7bbf8e2aef86/bbae380f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f498/11330343/52024f9a1929/bbae380f6.jpg

相似文献

1
Heterogeneous biomedical entity representation learning for gene-disease association prediction.基于异质生物医学实体表示学习的基因-疾病关联预测。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae380.
2
Global-local aware Heterogeneous Graph Contrastive Learning for multifaceted association prediction in miRNA-gene-disease networks.基于全局-局部感知的异质图对比学习在 miRNA-基因-疾病网络中的多方面关联预测
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae443.
3
Learning global dependencies and multi-semantics within heterogeneous graph for predicting disease-related lncRNAs.学习异质图中的全局依赖关系和多语义关系,以预测与疾病相关的 lncRNAs。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac361.
4
Predicting miRNA-disease association via graph attention learning and multiplex adaptive modality fusion.通过图注意力学习和多复用自适应模态融合预测 miRNA-疾病关联。
Comput Biol Med. 2024 Feb;169:107904. doi: 10.1016/j.compbiomed.2023.107904. Epub 2023 Dec 28.
5
Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning.基于双线性异质图表示学习的癌症潜在环状 RNA 生物标志物研究
BMC Med Inform Decis Mak. 2024 Jun 6;24(1):159. doi: 10.1186/s12911-024-02564-6.
6
SGCLDGA: unveiling drug-gene associations through simple graph contrastive learning.SGCLDGA:通过简单的图对比学习揭示药物-基因关联。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae231.
7
HGCLAMIR: Hypergraph contrastive learning with attention mechanism and integrated multi-view representation for predicting miRNA-disease associations.HGCLAMIR:基于注意力机制和集成多视图表示的超图对比学习用于预测miRNA-疾病关联
PLoS Comput Biol. 2024 Apr 23;20(4):e1011927. doi: 10.1371/journal.pcbi.1011927. eCollection 2024 Apr.
8
Effective type label-based synergistic representation learning for biomedical event trigger detection.用于生物医学事件触发检测的基于有效类型标签的协同表示学习
BMC Bioinformatics. 2024 Jul 31;25(1):251. doi: 10.1186/s12859-024-05851-1.
9
Multi-task prediction-based graph contrastive learning for inferring the relationship among lncRNAs, miRNAs and diseases.基于多任务预测的图对比学习推断 lncRNAs、miRNAs 和疾病之间的关系。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad276.
10
Exploring ncRNA-Drug Sensitivity Associations via Graph Contrastive Learning.通过图对比学习探索 ncRNA-药物敏感性关联。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1380-1389. doi: 10.1109/TCBB.2024.3385423. Epub 2024 Oct 9.

引用本文的文献

1
Cross-attention graph neural networks for inferring gene regulatory networks with skewed degree distribution.用于推断具有偏态度分布的基因调控网络的交叉注意力图神经网络。
BMC Bioinformatics. 2025 Jul 16;26(1):179. doi: 10.1186/s12859-025-06186-1.

本文引用的文献

1
DPSP: a multimodal deep learning framework for polypharmacy side effects prediction.DPSP:一个用于预测多种药物副作用的多模态深度学习框架。
Bioinform Adv. 2023 Aug 16;3(1):vbad110. doi: 10.1093/bioadv/vbad110. eCollection 2023.
2
HMCDA: a novel method based on the heterogeneous graph neural network and metapath for circRNA-disease associations prediction.HMCDA:一种基于异质图神经网络和元路径的 circRNA-疾病关联预测新方法。
BMC Bioinformatics. 2023 Sep 11;24(1):335. doi: 10.1186/s12859-023-05441-7.
3
Sequence pre-training-based graph neural network for predicting lncRNA-miRNA associations.
基于序列预训练的图神经网络预测 lncRNA-miRNA 相互作用。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad317.
4
Identifying Candidate Gene-Disease Associations via Graph Neural Networks.通过图神经网络识别候选基因与疾病的关联
Entropy (Basel). 2023 Jun 7;25(6):909. doi: 10.3390/e25060909.
5
A Self-Supervised Framework for Learning Biological Entities Representation by Fusing Class Information.基于融合类别信息的自监督框架学习生物实体表示
IEEE J Biomed Health Inform. 2023 Aug;27(8):4178-4188. doi: 10.1109/JBHI.2023.3273333. Epub 2023 Aug 7.
6
Predicting disease genes based on multi-head attention fusion.基于多头注意力融合的疾病基因预测。
BMC Bioinformatics. 2023 Apr 21;24(1):162. doi: 10.1186/s12859-023-05285-1.
7
End-to-end interpretable disease-gene association prediction.端到端可解释的疾病-基因关联预测。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad118.
8
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
9
Disease-gene prediction based on preserving structure network embedding.基于保留结构网络嵌入的疾病基因预测
Front Aging Neurosci. 2023 Feb 21;15:1061892. doi: 10.3389/fnagi.2023.1061892. eCollection 2023.
10
Predicting microbe-drug associations with structure-enhanced contrastive learning and self-paced negative sampling strategy.利用结构增强对比学习和自步负采样策略预测微生物-药物关联
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbac634.