Wheeler Diek W, Ascoli Giorgio A
Center for Neural Informatics, Structures, & Plasticity, Krasnow Institute for Advanced Study; and Bioengineering Department, Volgenau School of Engineering; George Mason University, Fairfax, VA, USA.
Neural Regen Res. 2025 Sep 1;20(9):2697-2705. doi: 10.4103/NRR.NRR-D-24-00532. Epub 2024 Sep 24.
Many fields, such as neuroscience, are experiencing the vast proliferation of cellular data, underscoring the need for organizing and interpreting large datasets. A popular approach partitions data into manageable subsets via hierarchical clustering, but objective methods to determine the appropriate classification granularity are missing. We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters. Here we present the corresponding protocol to classify cellular datasets by combining data-driven unsupervised hierarchical clustering with statistical testing. These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values, including molecular, physiological, and anatomical datasets. We demonstrate the protocol using cellular data from the Janelia MouseLight project to characterize morphological aspects of neurons.
许多领域,如神经科学,正经历着细胞数据的大量激增,这凸显了组织和解释大型数据集的必要性。一种流行的方法是通过层次聚类将数据划分为可管理的子集,但缺乏确定适当分类粒度的客观方法。我们最近引入了一种技术,该技术基于细胞在不同簇之间的差异必须大于在同一簇内的差异这一基本原理,系统地确定何时停止细分簇。在这里,我们展示了相应的协议,即通过将数据驱动的无监督层次聚类与统计测试相结合来对细胞数据集进行分类。这些通用函数适用于任何可以组织为数值二维矩阵的细胞数据集,包括分子、生理和解剖数据集。我们使用来自Janelia MouseLight项目的细胞数据来演示该协议,以表征神经元的形态学特征。