Suppr超能文献

恢复然后聚合:利用单细胞数据的全局结构信息进行统一的跨模态深度聚类。

Recover then aggregate: unified cross-modal deep clustering with global structural information for single-cell data.

作者信息

Wang Ziyi, Luo Peng, Xiao Mingming, Wang Boyang, Liu Tianyu, Sun Xiangyu

机构信息

Department of Surgical Oncology and General Surgery, First Hospital of China Medical University, Shenyang 110001, PR China.

Section of Esophageal and Mediastinal Oncology, Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae485.

Abstract

Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.

摘要

单细胞跨模态联合聚类已被广泛用于研究肿瘤微环境。尽管已经提出了许多方法,但准确聚类仍然是主要挑战。首先,由于测量限制,基因表达矩阵经常包含大量缺失值。大多数现有的聚类方法将其视为典型的多模态数据集而不做进一步处理。很少有方法在聚类前进行恢复,且没有充分考虑基础研究,导致结果不够理想。此外,现有的跨模态信息融合策略不能确保不同模态表示的一致性,可能导致冲突信息的整合,从而降低性能。为应对这些挑战,我们提出了“先恢复再聚合”策略,并引入了统一跨模态深度聚类模型。具体而言,我们开发了一种基于邻域相似性的数据增强技术,对拉普拉斯矩阵迭代施加秩约束,从而更新相似性矩阵并恢复缺失事件。同时,我们整合跨模态特征并采用对比学习将特定模态表示与一致表示对齐,增强不同模态信息的有效整合。在五个真实世界多模态数据集上的综合实验证明了该方法在单细胞聚类任务中的卓越有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46ed/11445907/66d75a8dc73a/bbae485f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验