利用深度神经网络训练动力学解释单细胞和空间组学数据。

Interpreting single-cell and spatial omics data using deep neural network training dynamics.

作者信息

Karin Jonathan, Mintz Reshef, Raveh Barak, Nitzan Mor

机构信息

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.

Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel.

出版信息

Nat Comput Sci. 2024 Dec;4(12):941-954. doi: 10.1038/s43588-024-00721-5. Epub 2024 Dec 4.

DOI:10.1038/s43588-024-00721-5

PMID:39633094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11659171/

Abstract

Single-cell and spatial omics datasets can be organized and interpreted by annotating single cells to distinct types, states, locations or phenotypes. However, cell annotations are inherently ambiguous, as discrete labels with subjective interpretations are assigned to heterogeneous cell populations on the basis of noisy, sparse and high-dimensional data. Here we developed Annotatability, a framework for identifying annotation mismatches and characterizing biological data structure by monitoring the dynamics and difficulty of training a deep neural network over such annotated data. Following this, we developed a signal-aware graph embedding method that enables downstream analysis of biological signals. This embedding captures cellular communities associated with target signals. Using Annotatability, we address key challenges in the interpretation of genomic data, demonstrated over eight single-cell RNA sequencing and spatial omics datasets, including identifying erroneous annotations and intermediate cell states, delineating developmental or disease trajectories, and capturing cellular heterogeneity. These results underscore the broad applicability of annotation-trainability analysis via Annotatability for unraveling cellular diversity and interpreting collective cell behaviors in health and disease.

摘要

通过将单细胞注释为不同的类型、状态、位置或表型，可以对单细胞和空间组学数据集进行组织和解释。然而，细胞注释本质上是模糊的，因为基于噪声大、稀疏且高维的数据，具有主观解释的离散标签被分配给了异质细胞群体。在这里，我们开发了Annotatability，这是一个通过监测在这类注释数据上训练深度神经网络的动态过程和难度来识别注释不匹配并表征生物数据结构的框架。在此基础上，我们开发了一种信号感知图嵌入方法，能够对生物信号进行下游分析。这种嵌入捕获与目标信号相关的细胞群落。使用Annotatability，我们解决了基因组数据解释中的关键挑战，这在八个单细胞RNA测序和空间组学数据集上得到了验证，包括识别错误注释和中间细胞状态、描绘发育或疾病轨迹以及捕获细胞异质性。这些结果强调了通过Annotatability进行注释可训练性分析在揭示细胞多样性以及解释健康和疾病中的集体细胞行为方面的广泛适用性。