Suppr超能文献

pathCLIP:通过图像-文本对比学习从生物途径图中检测基因和基因关系。

pathCLIP: Detection of Genes and Gene Relations From Biological Pathway Figures Through Image-Text Contrastive Learning.

出版信息

IEEE J Biomed Health Inform. 2024 Aug;28(8):5007-5019. doi: 10.1109/JBHI.2024.3383610. Epub 2024 Aug 6.

Abstract

In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. Two case studies on extracting pathway information from literature of non-small cell lung cancer and Alzheimer's disease further demonstrate the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.

摘要

在生物医学文献中,生物途径通常通过图像和文本的组合来描述。这些途径包含有价值的信息,包括基因及其关系,为深入了解生物机制和精准医学提供了线索。对文献中的途径信息进行编目,能够实现这些信息的整合,构建一个全面的知识库。虽然有些研究已经分别从图像和文本中提取了途径信息,但它们往往忽略了两种模式之间的对应关系。在本文中,我们提出了一个名为 pathCLIP 的途径图编目系统,用于从途径图中识别基因和基因关系。我们的关键创新是使用图像-文本对比学习模型来学习图像片段和基因及基因关系的文本描述的协调嵌入,从而提高编目效果。我们使用来自 PubMed 的途径图进行验证的结果表明,我们的多模态模型优于仅使用单一模态的模型。此外,我们的系统能够有效地从多个文献来源中编目基因和基因关系。对非小细胞肺癌和阿尔茨海默病文献中提取途径信息的两个案例研究进一步证明了我们编目途径信息在增强 KEGG 数据库中相关途径方面的有用性。

相似文献

本文引用的文献

1
Genenames.org: the HGNC resources in 2023.Genenames.org:2023 年的 HGNC 资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888.
4
The reactome pathway knowledgebase 2022.反应体通路知识库2022版。
Nucleic Acids Res. 2022 Jan 7;50(D1):D687-D692. doi: 10.1093/nar/gkab1028.
7
Gaussian Embedding for Large-scale Gene Set Analysis.用于大规模基因集分析的高斯嵌入
Nat Mach Intell. 2020 Jul;2(7):387-395. doi: 10.1038/s42256-020-0193-2. Epub 2020 Jun 15.
10
Focal Loss for Dense Object Detection.用于密集目标检测的焦散损失
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验