Kirchgaessner Raphael, Keutler Kaya, Sivakumar Layaa, Song Xubo, Ellrott Kyle
Oregon Health and Science University.
bioRxiv. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.
AI based embeddings offer the possibilities of encoding complex biological data into low dimensional spaces, called embedding spaces, that maintain the relationships between entities. There is an open question about the compatibility of embedding spaces that are created without any coordination. It has been assumed that signals in these unaligned embedding spaces would be destroyed if vectors were aggregated into summed values. We trained embedding models across different data modalities and tested aggregating the values together to test this assumption. Our research shows that signal from unaligned embedded values is conserved and able to still be used for learning tasks, such as data modality and tumor of origin recognition.
基于人工智能的嵌入技术提供了将复杂生物数据编码到低维空间(称为嵌入空间)的可能性,该空间维持了实体之间的关系。关于在没有任何协调的情况下创建的嵌入空间的兼容性存在一个开放性问题。人们一直认为,如果将向量聚合为总和值,这些未对齐的嵌入空间中的信号将会被破坏。我们在不同的数据模态上训练了嵌入模型,并测试了将这些值聚合在一起以检验这一假设。我们的研究表明,来自未对齐嵌入值的信号是守恒的,并且仍然能够用于学习任务,例如数据模态和肿瘤起源识别。