Center of Functionally Integrative Neuroscience, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.
Center for Proteins in Memory, Department of Biomedicine, Aarhus University, Aarhus, Denmark.
J Neurophysiol. 2023 Aug 1;130(2):364-379. doi: 10.1152/jn.00054.2023. Epub 2023 Jul 5.
Unsupervised, data-driven methods are commonly used in neuroscience to automatically decompose data into interpretable patterns. These patterns differ from one another depending on the assumptions of the models. How these assumptions affect specific data decompositions in practice, however, is often unclear, which hinders model applicability and interpretability. For instance, the hidden Markov model (HMM) automatically detects characteristic, recurring activity patterns (so-called states) from time series data. States are defined by a certain probability distribution, whose state-specific parameters are estimated from the data. But what specific features, from all of those that the data contain, do the states capture? That depends on the choice of probability distribution and on other model hyperparameters. Using both synthetic and real data, we aim to better characterize the behavior of two HMM types that can be applied to electrophysiological data. Specifically, we study which differences in data features (such as frequency, amplitude, or signal-to-noise ratio) are more salient to the models and therefore more likely to drive the state decomposition. Overall, we aim at providing guidance for the appropriate use of this type of analysis on one- or two-channel neural electrophysiological data and an informed interpretation of its results given the characteristics of the data and the purpose of the analysis. Compared with classical supervised methods, unsupervised methods of analysis have the advantage to be freer of subjective biases. However, it is not always clear what aspects of the data these methods are most sensitive to, which complicates interpretation. Focusing on the hidden Markov model, commonly used to describe electrophysiological data, we explore in detail the nature of its estimates through simulations and real data examples, providing important insights about what to expect from these models.
无监督、数据驱动的方法在神经科学中常用于自动将数据分解为可解释的模式。这些模式因模型的假设而异。然而,这些假设如何在实践中影响特定的数据分解通常并不清楚,这阻碍了模型的适用性和可解释性。例如,隐马尔可夫模型(HMM)自动从时间序列数据中检测出具有特征、重复出现的活动模式(所谓的状态)。状态由特定的概率分布定义,其特定于状态的参数是从数据中估计出来的。但是,状态捕获了数据中所有特征的哪些特定特征?这取决于概率分布的选择和其他模型超参数。使用合成和真实数据,我们旨在更好地描述可应用于电生理数据的两种 HMM 类型的行为。具体来说,我们研究了数据特征(如频率、幅度或信噪比)的哪些差异对模型更重要,因此更有可能驱动状态分解。总的来说,我们旨在为在单通道或双通道神经电生理数据上使用这种类型的分析提供指导,并根据数据的特征和分析的目的对其结果进行明智的解释。与经典的监督方法相比,无监督的分析方法具有更少的主观性偏见的优势。然而,并不总是清楚这些方法对数据的哪些方面最敏感,这使得解释变得复杂。我们专注于隐马尔可夫模型,该模型常用于描述电生理数据,通过模拟和真实数据示例详细探讨了其估计的性质,为这些模型提供了重要的预期信息。