Suppr超能文献

有原则且可解释的可对齐性测试和单细胞数据的整合。

Principled and interpretable alignability testing and integration of single-cell data.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115.

Department of Biomedical Data Science, Stanford University, Stanford, CA 94305.

出版信息

Proc Natl Acad Sci U S A. 2024 Mar 5;121(10):e2313719121. doi: 10.1073/pnas.2313719121. Epub 2024 Feb 28.

Abstract

Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.

摘要

单细胞数据整合可以提供细胞的全面分子视图,并且已经开发了许多算法来去除不需要的技术或生物学变异,并整合异质的单细胞数据集。尽管它们被广泛使用,但现有的方法存在几个基本的局限性。特别是,我们缺乏一种严格的统计检验,用于确定两个高维单细胞数据集是否可对齐(因此甚至应该对齐)。此外,流行的方法在对齐过程中会极大地扭曲数据,使得对齐后的数据和下游分析难以解释。为了克服这些限制,我们提出了一种谱流形对齐和推断(SMAI)框架,该框架能够对具有相同类型特征的单细胞数据进行有原则且可解释的对齐性测试和结构保持集成。SMAI 提供了一种统计检验,可稳健地评估数据集之间的可对齐性,以避免误导性推断,并由高维统计理论证明是合理的。在各种真实和模拟基准数据集上,它的性能优于常用的对齐方法。此外,我们还表明,SMAI 可以改进各种下游分析,例如差异表达基因的鉴定和单细胞空间转录组学的推断,从而提供更深入的生物学见解。SMAI 的可解释性还能够量化和更深入地理解单细胞数据中技术混杂因素的来源。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验