Zhang Tianjiao, Zhao Zhongqian, Zhang Hongfei, Wu Zhenao, Wang Fang, Wang Guohua
College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China.
The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, 324000, China.
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf443.
Identifying cell types that constitute complex tissue components using single-cell sequencing data is a critical issue in the field of biology. With the continuous advancement of sequencing technologies, the recognition of cell types has evolved from analyzing single-omics scRNA-seq data to integrating multi-omics single-cell data. However, existing methods for integrative analysis of high-dimensional multi-omics single-cell sequencing data have several limitations, including reliance on specific distribution assumptions of the data, sensitivity to noise, and clustering accuracy constrained by independent clustering methods. These issues have restricted improvements in the accuracy of cell type identification and hindered the application of such methods to large-scale datasets for cell type recognition. To address these challenges, we propose a novel method for aligning and integrating single-cell multi-omics data-scECDA.
The scECDA employs independently designed autoencoders that can autonomously learn the feature distributions of each omics dataset. By incorporating enhanced contrastive learning and differential attention mechanisms, the scECDA effectively reduces the interference of noise during data integration. The model design exhibits high flexibility, enabling adaptation to single-cell omics data generated by different technological platforms. It directly outputs integrated latent features and end-to-end cell clustering results. Through the analysis of the distribution of latent features, the scECDA can effectively identify key biological markers and precisely distinguish cell subtypes, recover cluster-specific motif and infer trajectory. The scECDA was applied to eight paired single-cell multi-omics datasets, covering data generated by 10X Multiome, CITE-seq, and TEA-seq technologies. Compared to eight state-of-the-art methods, scECDA demonstrated higher accuracy in cell clustering.
The scECDA code is freely available at https://github.com/SuperheroBetter/scECDA.
利用单细胞测序数据识别构成复杂组织成分的细胞类型是生物学领域的一个关键问题。随着测序技术的不断进步,细胞类型的识别已从分析单组学scRNA-seq数据发展到整合多组学单细胞数据。然而,现有的高维多组学单细胞测序数据整合分析方法存在若干局限性,包括依赖数据的特定分布假设、对噪声敏感以及聚类准确性受独立聚类方法限制。这些问题限制了细胞类型识别准确性的提高,并阻碍了此类方法在大规模细胞类型识别数据集上的应用。为应对这些挑战,我们提出了一种用于对齐和整合单细胞多组学数据的新方法——scECDA。
scECDA采用独立设计的自动编码器,可自主学习每个组学数据集的特征分布。通过纳入增强的对比学习和差异注意力机制,scECDA在数据整合过程中有效降低了噪声干扰。该模型设计具有高度灵活性,能够适应不同技术平台生成的单细胞组学数据。它直接输出整合后的潜在特征和端到端的细胞聚类结果。通过对潜在特征分布的分析,scECDA能够有效识别关键生物标志物并精确区分细胞亚型,恢复聚类特异性基序并推断轨迹。scECDA应用于八个配对的单细胞多组学数据集,涵盖由10X Multiome、CITE-seq和TEA-seq技术生成的数据。与八种最先进的方法相比,scECDA在细胞聚类方面表现出更高的准确性。