Suppr超能文献

跨未对齐嵌入空间聚合多模态癌症数据可保留肿瘤起源信号。

Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal.

作者信息

Kirchgaessner Raphael, Keutler Kaya, Sivakumar Layaa, Song Xubo, Ellrott Kyle

机构信息

Oregon Health and Science University.

出版信息

bioRxiv. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.

Abstract

AI based embeddings offer the possibilities of encoding complex biological data into low dimensional spaces, called embedding spaces, that maintain the relationships between entities. There is an open question about the compatibility of embedding spaces that are created without any coordination. It has been assumed that signals in these unaligned embedding spaces would be destroyed if vectors were aggregated into summed values. We trained embedding models across different data modalities and tested aggregating the values together to test this assumption. Our research shows that signal from unaligned embedded values is conserved and able to still be used for learning tasks, such as data modality and tumor of origin recognition.

摘要

基于人工智能的嵌入技术提供了将复杂生物数据编码到低维空间(称为嵌入空间)的可能性,该空间维持了实体之间的关系。关于在没有任何协调的情况下创建的嵌入空间的兼容性存在一个开放性问题。人们一直认为,如果将向量聚合为总和值,这些未对齐的嵌入空间中的信号将会被破坏。我们在不同的数据模态上训练了嵌入模型,并测试了将这些值聚合在一起以检验这一假设。我们的研究表明,来自未对齐嵌入值的信号是守恒的,并且仍然能够用于学习任务,例如数据模态和肿瘤起源识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/769c/12132557/5a4edc391c8c/nihpp-2025.05.14.653900v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验