Zhou Han, Cao Kai, Lu Yang Young
Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada.
Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Boston, MA, 02142, United States.
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf345.
Recent advances in single-cell multimodal omics technologies enable the exploration of cellular systems at unprecedented resolution, leading to the rapid generation of multimodal datasets that require sophisticated integration methods. Diagonal integration has emerged as a flexible solution for integrating heterogeneous single-cell data without relying on shared cells or features. However, the absence of anchoring elements introduces the risk of artificial integrations, where cells across modalities are incorrectly aligned due to ambiguous mapping.
To address this challenge, we propose SONATA (Securing diagOnal iNtegrATion against Ambiguous) mapping, a novel diagnostic method designed to detect potential artificial integrations resulting from ambiguous mappings in diagonal data integration. SONATA identifies ambiguous alignments by quantifying cell-cell ambiguity within the data manifold, ensuring that biologically meaningful integrations are distinguished from spurious ones. It is worth noting that SONATA is not designed to replace any existing pipelines for diagonal data integration; instead, SONATA works simply as an add-on to an existing pipeline for achieving more reliable integration. Through a comprehensive evaluation on both simulated and real multimodal single-cell datasets, we observe that artificial integrations in diagonal data integration are widespread yet surprisingly overlooked, occurring across all mainstream diagonal integration methods. We demonstrate SONATA's ability to safeguard against misleading integrations and provide actionable insights into potential integration failures across mainstream methods. Our approach offers a robust framework for ensuring the reliability and interpretability of multimodal single-cell data integration.
The source code is available at (https://github.com/batmen-lab/SONATA).
单细胞多组学技术的最新进展使得能够以前所未有的分辨率探索细胞系统,从而迅速生成需要复杂整合方法的多组学数据集。对角整合已成为一种灵活的解决方案,用于整合异质单细胞数据,而无需依赖共享细胞或特征。然而,缺乏锚定元素会带来人工整合的风险,即跨模态的细胞由于映射模糊而错误对齐。
为应对这一挑战,我们提出了SONATA(防止模糊对角整合)映射,这是一种新颖的诊断方法,旨在检测对角数据整合中因模糊映射导致的潜在人工整合。SONATA通过量化数据流形内的细胞-细胞模糊性来识别模糊对齐,确保将生物学上有意义的整合与虚假整合区分开来。值得注意的是,SONATA并非旨在取代任何现有的对角数据整合管道;相反,SONATA只是作为现有管道的附加组件,以实现更可靠的整合。通过对模拟和真实多模态单细胞数据集的全面评估,我们观察到对角数据整合中的人工整合很普遍但却惊人地被忽视,在所有主流对角整合方法中都存在。我们展示了SONATA防范误导性整合的能力,并为跨主流方法的潜在整合失败提供可操作的见解。我们的方法为确保多模态单细胞数据整合的可靠性和可解释性提供了一个强大的框架。