跨未对齐嵌入空间聚合多模态癌症数据可保留肿瘤起源信号。

Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal.

作者信息

Kirchgaessner Raphael, Keutler Kaya, Sivakumar Layaa, Song Xubo, Ellrott Kyle

机构信息

Oregon Health and Science University.

出版信息

bioRxiv. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.

DOI:10.1101/2025.05.14.653900

PMID:40462901

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12132557/

Abstract

AI based embeddings offer the possibilities of encoding complex biological data into low dimensional spaces, called embedding spaces, that maintain the relationships between entities. There is an open question about the compatibility of embedding spaces that are created without any coordination. It has been assumed that signals in these unaligned embedding spaces would be destroyed if vectors were aggregated into summed values. We trained embedding models across different data modalities and tested aggregating the values together to test this assumption. Our research shows that signal from unaligned embedded values is conserved and able to still be used for learning tasks, such as data modality and tumor of origin recognition.

摘要

基于人工智能的嵌入技术提供了将复杂生物数据编码到低维空间（称为嵌入空间）的可能性，该空间维持了实体之间的关系。关于在没有任何协调的情况下创建的嵌入空间的兼容性存在一个开放性问题。人们一直认为，如果将向量聚合为总和值，这些未对齐的嵌入空间中的信号将会被破坏。我们在不同的数据模态上训练了嵌入模型，并测试了将这些值聚合在一起以检验这一假设。我们的研究表明，来自未对齐嵌入值的信号是守恒的，并且仍然能够用于学习任务，例如数据模态和肿瘤起源识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/769c/12132557/5a4edc391c8c/nihpp-2025.05.14.653900v1-f0001.jpg

相似文献

Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal.跨未对齐嵌入空间聚合多模态癌症数据可保留肿瘤起源信号。

bioRxiv. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.

A best-match approach for gene set analyses in embedding spaces.一种在嵌入空间中进行基因集分析的最佳匹配方法。

Genome Res. 2024 Oct 11;34(9):1421-1433. doi: 10.1101/gr.279141.124.

Hyperbolic hierarchical knowledge graph embeddings for biological entities.用于生物实体的双曲分层知识图谱嵌入

J Biomed Inform. 2023 Nov;147:104503. doi: 10.1016/j.jbi.2023.104503. Epub 2023 Sep 29.

Graph Representation Learning and Its Applications: A Survey.图表示学习及其应用综述。

Sensors (Basel). 2023 Apr 21;23(8):4168. doi: 10.3390/s23084168.

Survey on graph embeddings and their applications to machine learning problems on graphs.关于图嵌入及其在图上机器学习问题中的应用的综述。

PeerJ Comput Sci. 2021 Feb 4;7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification.有向图的以边为中心的嵌入：稀疏化下的性质与稳定性

Entropy (Basel). 2025 Mar 14;27(3):304. doi: 10.3390/e27030304.

Pair-wise or high-order? A self-adaptive graph framework for knowledge graph embedding.成对还是高阶？一种用于知识图谱嵌入的自适应图框架。

Neural Netw. 2025 Aug;188:107494. doi: 10.1016/j.neunet.2025.107494. Epub 2025 Apr 24.

Joint variational autoencoders for multimodal imputation and embedding.用于多模态插补和嵌入的联合变分自编码器

Nat Mach Intell. 2023 Jun;5(6):631-642. doi: 10.1038/s42256-023-00663-z. Epub 2023 May 29.

Predicting biomedical relationships using the knowledge and graph embedding cascade model.利用知识和图嵌入级联模型预测生物医学关系。

PLoS One. 2019 Jun 13;14(6):e0218264. doi: 10.1371/journal.pone.0218264. eCollection 2019.

Collaborative bi-aggregation for directed graph embedding.协同双聚合的有向图嵌入。

Neural Netw. 2023 Jul;164:707-718. doi: 10.1016/j.neunet.2023.05.024. Epub 2023 May 18.

本文引用的文献

Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization.通过对比优化增强基因组分析中的核苷酸序列表示。

Commun Biol. 2025 Mar 29;8(1):517. doi: 10.1038/s42003-025-07902-6.

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets.使用紧凑特征集将非TCGA癌症样本分类为TCGA分子亚型。

Cancer Cell. 2025 Feb 10;43(2):195-212.e11. doi: 10.1016/j.ccell.2024.12.002. Epub 2025 Jan 2.

Knowledge graph embeddings in the biomedical domain: are they useful? A look at link prediction, rule learning, and downstream polypharmacy tasks.生物医学领域中的知识图谱嵌入：它们有用吗？审视链接预测、规则学习及下游多药治疗任务。

Bioinform Adv. 2024 Jul 17;4(1):vbae097. doi: 10.1093/bioadv/vbae097. eCollection 2024.

Multimodal data integration for oncology in the era of deep neural networks: a review.深度神经网络时代肿瘤学中的多模态数据整合：综述

Front Artif Intell. 2024 Jul 25;7:1408843. doi: 10.3389/frai.2024.1408843. eCollection 2024.

Transformer models in biomedicine.生物医学中的 Transformer 模型。

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):214. doi: 10.1186/s12911-024-02600-5.

Molecular analysis of TCGA breast cancer histologic types.癌症基因组图谱（TCGA）乳腺癌组织学类型的分子分析。

Cell Genom. 2021 Dec 8;1(3). doi: 10.1016/j.xgen.2021.100067.

Estimating tumor mutational burden across multiple cancer types using whole-exome sequencing.使用全外显子组测序评估多种癌症类型的肿瘤突变负荷。

Ann Transl Med. 2021 Sep;9(18):1437. doi: 10.21037/atm-21-4227.

Survey on graph embeddings and their applications to machine learning problems on graphs.关于图嵌入及其在图上机器学习问题中的应用的综述。

PeerJ Comput Sci. 2021 Feb 4;7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.在二分类混淆矩阵评估中，马修斯相关系数（MCC）比平衡准确率、庄家知情度和标记度更可靠。

BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.

k-hop graph neural networks.k 跳图神经网络。

Neural Netw. 2020 Oct;130:195-205. doi: 10.1016/j.neunet.2020.07.008. Epub 2020 Jul 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

跨未对齐嵌入空间聚合多模态癌症数据可保留肿瘤起源信号。

Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献