• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

pathCLIP:通过图像-文本对比学习从生物途径图中检测基因和基因关系。

pathCLIP: Detection of Genes and Gene Relations From Biological Pathway Figures Through Image-Text Contrastive Learning.

出版信息

IEEE J Biomed Health Inform. 2024 Aug;28(8):5007-5019. doi: 10.1109/JBHI.2024.3383610. Epub 2024 Aug 6.

DOI:10.1109/JBHI.2024.3383610
PMID:38568768
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11363067/
Abstract

In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. Two case studies on extracting pathway information from literature of non-small cell lung cancer and Alzheimer's disease further demonstrate the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.

摘要

在生物医学文献中,生物途径通常通过图像和文本的组合来描述。这些途径包含有价值的信息,包括基因及其关系,为深入了解生物机制和精准医学提供了线索。对文献中的途径信息进行编目,能够实现这些信息的整合,构建一个全面的知识库。虽然有些研究已经分别从图像和文本中提取了途径信息,但它们往往忽略了两种模式之间的对应关系。在本文中,我们提出了一个名为 pathCLIP 的途径图编目系统,用于从途径图中识别基因和基因关系。我们的关键创新是使用图像-文本对比学习模型来学习图像片段和基因及基因关系的文本描述的协调嵌入,从而提高编目效果。我们使用来自 PubMed 的途径图进行验证的结果表明,我们的多模态模型优于仅使用单一模态的模型。此外,我们的系统能够有效地从多个文献来源中编目基因和基因关系。对非小细胞肺癌和阿尔茨海默病文献中提取途径信息的两个案例研究进一步证明了我们编目途径信息在增强 KEGG 数据库中相关途径方面的有用性。

相似文献

1
pathCLIP: Detection of Genes and Gene Relations From Biological Pathway Figures Through Image-Text Contrastive Learning.pathCLIP:通过图像-文本对比学习从生物途径图中检测基因和基因关系。
IEEE J Biomed Health Inform. 2024 Aug;28(8):5007-5019. doi: 10.1109/JBHI.2024.3383610. Epub 2024 Aug 6.
2
pathCLIP: Detection of Genes and Gene Relations from Biological Pathway Figures through Image-Text Contrastive Learning.pathCLIP:通过图像-文本对比学习从生物通路图中检测基因和基因关系。
bioRxiv. 2023 Nov 2:2023.10.31.564859. doi: 10.1101/2023.10.31.564859.
3
Short-Term Memory Impairment短期记忆障碍
4
Fabricating mice and dementia: opening up relations in multi-species research制造小鼠与痴呆症:开启多物种研究中的关联
5
Regional cerebral blood flow single photon emission computed tomography for detection of Frontotemporal dementia in people with suspected dementia.用于检测疑似痴呆患者额颞叶痴呆的局部脑血流单光子发射计算机断层扫描
Cochrane Database Syst Rev. 2015 Jun 23;2015(6):CD010896. doi: 10.1002/14651858.CD010896.pub2.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis.计算机和其他电子戒烟辅助手段的有效性和成本效益:系统评价和网络荟萃分析。
Health Technol Assess. 2012;16(38):1-205, iii-v. doi: 10.3310/hta16380.
8
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
9
Interventions to improve safe and effective medicines use by consumers: an overview of systematic reviews.改善消费者安全有效用药的干预措施:系统评价概述
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):CD007768. doi: 10.1002/14651858.CD007768.pub3.
10
Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.利用扩散模型探索基于脑电图信号的图像生成潜力:结合混合方法和多模态分析的综合框架
JMIR Med Inform. 2025 Jun 25;13:e72027. doi: 10.2196/72027.

引用本文的文献

1
Predicting ROS1 and ALK fusions in NSCLC from H&E slides with a two-step vision transformer approach.采用两步视觉变换器方法从苏木精-伊红(H&E)染色切片预测非小细胞肺癌中的ROS1和ALK融合。
NPJ Precis Oncol. 2025 Jul 30;9(1):266. doi: 10.1038/s41698-025-01037-x.

本文引用的文献

1
Genenames.org: the HGNC resources in 2023.Genenames.org:2023 年的 HGNC 资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888.
2
Hippo signaling pathway: A comprehensive gene expression profile analysis in breast cancer.Hippo 信号通路:乳腺癌的综合基因表达谱分析。
Biomed Pharmacother. 2022 Jul;151:113144. doi: 10.1016/j.biopha.2022.113144. Epub 2022 May 25.
3
Classification of Breast Cancer Nottingham Prognostic Index Using High-Dimensional Embedding and Residual Neural Network.使用高维嵌入和残差神经网络对乳腺癌诺丁汉预后指数进行分类
Cancers (Basel). 2022 Feb 13;14(4):934. doi: 10.3390/cancers14040934.
4
The reactome pathway knowledgebase 2022.反应体通路知识库2022版。
Nucleic Acids Res. 2022 Jan 7;50(D1):D687-D692. doi: 10.1093/nar/gkab1028.
5
Factor graph-aggregated heterogeneous network embedding for disease-gene association prediction.基于因子图聚合的异质网络嵌入的疾病-基因关联预测。
BMC Bioinformatics. 2021 Mar 29;22(1):165. doi: 10.1186/s12859-021-04099-3.
6
Pathway information extracted from 25 years of pathway figures.从 25 年的通路图中提取的通路信息。
Genome Biol. 2020 Nov 9;21(1):273. doi: 10.1186/s13059-020-02181-2.
7
Gaussian Embedding for Large-scale Gene Set Analysis.用于大规模基因集分析的高斯嵌入
Nat Mach Intell. 2020 Jul;2(7):387-395. doi: 10.1038/s42256-020-0193-2. Epub 2020 Jun 15.
8
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
9
PubTator central: automated concept annotation for biomedical full text articles.PubTator 中心:用于生物医学全文文章的自动概念标注。
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.
10
Focal Loss for Dense Object Detection.用于密集目标检测的焦散损失
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.