Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.
Department of Neuroscience, Karolinska Institutet, Solna, Sweden.
PLoS Biol. 2021 Jul 19;19(7):e3001341. doi: 10.1371/journal.pbio.3001341. eCollection 2021 Jul.
High-throughput, spatially resolved gene expression techniques are poised to be transformative across biology by overcoming a central limitation in single-cell biology: the lack of information on relationships that organize the cells into the functional groupings characteristic of tissues in complex multicellular organisms. Spatial expression is particularly interesting in the mammalian brain, which has a highly defined structure, strong spatial constraint in its organization, and detailed multimodal phenotypes for cells and ensembles of cells that can be linked to mesoscale properties such as projection patterns, and from there, to circuits generating behavior. However, as with any type of expression data, cross-dataset benchmarking of spatial data is a crucial first step. Here, we assess the replicability, with reference to canonical brain subdivisions, between the Allen Institute's in situ hybridization data from the adult mouse brain (Allen Brain Atlas (ABA)) and a similar dataset collected using spatial transcriptomics (ST). With the advent of tractable spatial techniques, for the first time, we are able to benchmark the Allen Institute's whole-brain, whole-transcriptome spatial expression dataset with a second independent dataset that similarly spans the whole brain and transcriptome. We use regularized linear regression (LASSO), linear regression, and correlation-based feature selection in a supervised learning framework to classify expression samples relative to their assayed location. We show that Allen Reference Atlas labels are classifiable using transcription in both data sets, but that performance is higher in the ABA than in ST. Furthermore, models trained in one dataset and tested in the opposite dataset do not reproduce classification performance bidirectionally. While an identifying expression profile can be found for a given brain area, it does not generalize to the opposite dataset. In general, we found that canonical brain area labels are classifiable in gene expression space within dataset and that our observed performance is not merely reflecting physical distance in the brain. However, we also show that cross-platform classification is not robust. Emerging spatial datasets from the mouse brain will allow further characterization of cross-dataset replicability ultimately providing a valuable reference set for understanding the cell biology of the brain.
高通量、空间分辨的基因表达技术有望通过克服单细胞生物学的一个核心限制而在生物学领域产生变革:缺乏将细胞组织成复杂多细胞生物体组织特征的功能分组的信息。在哺乳动物大脑中,空间表达特别有趣,因为它具有高度定义的结构、强烈的组织空间约束,以及可以与中尺度性质(如投射模式)相关联的细胞和细胞集合的详细多模态表型,从而产生行为。然而,与任何类型的表达数据一样,空间数据的跨数据集基准测试是至关重要的第一步。在这里,我们评估了艾伦研究所(Allen Institute)的原位杂交数据(来自成年小鼠大脑的 Allen 大脑图谱(Allen Brain Atlas (ABA)))与使用空间转录组学(ST)收集的类似数据集之间的可重复性,参考了经典的大脑细分。随着可行的空间技术的出现,我们首次能够使用第二个独立数据集(同样涵盖整个大脑和转录组)来基准测试 Allen 研究所的全脑、全转录组空间表达数据集。我们使用正则化线性回归(LASSO)、线性回归和基于相关性的特征选择在监督学习框架中对表达样本进行分类,以相对于其检测位置。我们表明,使用两个数据集的转录本都可以对 Allen 参考图谱标签进行分类,但在 ABA 中的性能高于 ST。此外,在一个数据集上训练并在相反数据集上测试的模型不能双向再现分类性能。虽然可以为给定的大脑区域找到识别表达谱,但它不能推广到相反的数据集。总的来说,我们发现经典的大脑区域标签在数据集内的基因表达空间中是可分类的,我们观察到的性能不仅仅反映了大脑中的物理距离。然而,我们也表明,跨平台分类是不稳定的。来自小鼠大脑的新兴空间数据集将允许进一步表征跨数据集的可重复性,最终为理解大脑的细胞生物学提供有价值的参考集。