Suppr
超能文献

从单细胞多组学数据中学习细胞的一致性和特异性。

Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data.

出版信息

IEEE J Biomed Health Inform. 2024 May;28(5):3134-3145. doi: 10.1109/JBHI.2024.3370868.

DOI:10.1109/JBHI.2024.3370868

Abstract

Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.

摘要

单细胞技术的进步同时发展了细胞水平的表观基因组和转录组谱，为探索潜在的生物学机制提供了机会。尽管已经做出了巨大的努力，但由于数据的异质性、复杂的耦合和可解释性，单细胞多组学数据的整合分析仍然具有挑战性。为了解决这些问题，我们提出了一种新的基于自表示学习的多组学数据整合聚类算法（sLMIC），用于整合单细胞表观基因组谱（DNA 甲基化或 scATAC-seq）和转录组（scRNA-seq），该算法明确提取细胞的一致和特定特征，从而促进细胞聚类。具体来说，sLMIC 为每种类型的单细胞数据构建一个图，从而将组学数据转换为多层网络，有效地消除了组学数据的异质性。然后，sLMIC 采用低秩和排他性约束将细胞的自表示分为两部分，即共享和特定特征，这些特征明确地描述了组学数据的一致性和多样性，为建模细胞类型的结构提供了一种有效的策略。特征提取和细胞聚类被联合制定为一个整体目标函数，其中数据的潜在特征是在细胞聚类的指导下获得的。在来自不同生物体和组织的 13 个单细胞多组学数据集上的广泛实验结果表明，sLMIC 在各种度量标准上明显优于先进的算法。