Shan Kevin Q, Lubenov Evgueniy V, Siapas Athanassios G
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, United States; Division of Engineering and Applied Science, California Institute of Technology, Pasadena, United States.
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, United States; Division of Engineering and Applied Science, California Institute of Technology, Pasadena, United States.
J Neurosci Methods. 2017 Aug 15;288:82-98. doi: 10.1016/j.jneumeth.2017.06.017. Epub 2017 Jun 23.
Chronic extracellular recordings are a powerful tool for systems neuroscience, but spike sorting remains a challenge. A common approach is to fit a generative model, such as a mixture of Gaussians, to the observed spike data. Even if non-parametric methods are used for spike sorting, such generative models provide a quantitative measure of unit isolation quality, which is crucial for subsequent interpretation of the sorted spike trains.
We present a spike sorting strategy that models the data as a mixture of drifting t-distributions. This model captures two important features of chronic extracellular recordings-cluster drift over time and heavy tails in the distribution of spikes-and offers improved robustness to outliers.
We evaluate this model on several thousand hours of chronic tetrode recordings and show that it fits the empirical data substantially better than a mixture of Gaussians. We also provide a software implementation that can re-fit long datasets in a few seconds, enabling interactive clustering of chronic recordings.
We identify three common failure modes of spike sorting methods that assume stationarity and evaluate their impact given the empirically-observed cluster drift in chronic recordings. Using hybrid ground truth datasets, we also demonstrate that our model-based estimate of misclassification error is more accurate than previous unit isolation metrics.
The mixture of drifting t-distributions model enables efficient spike sorting of long datasets and provides an accurate measure of unit isolation quality over a wide range of conditions.
慢性细胞外记录是系统神经科学的一种强大工具,但尖峰分类仍然是一个挑战。一种常见的方法是将生成模型,如高斯混合模型,拟合到观察到的尖峰数据上。即使使用非参数方法进行尖峰分类,这种生成模型也能提供单位分离质量的定量测量,这对于后续对分类后的尖峰序列的解释至关重要。
我们提出了一种尖峰分类策略,将数据建模为漂移t分布的混合。该模型捕捉了慢性细胞外记录的两个重要特征——随着时间的推移聚类漂移以及尖峰分布中的重尾——并对异常值具有更高的鲁棒性。
我们在数千小时的慢性四极管记录上评估了该模型,结果表明它比高斯混合模型能更好地拟合经验数据。我们还提供了一个软件实现,它可以在几秒钟内重新拟合长数据集,实现慢性记录的交互式聚类。
我们确定了假设平稳性的尖峰分类方法的三种常见失败模式,并评估了它们在慢性记录中根据经验观察到的聚类漂移所产生的影响。使用混合的真实数据集,我们还证明了基于我们模型的误分类误差估计比以前的单位分离指标更准确。
漂移t分布混合模型能够对长数据集进行高效的尖峰分类,并在广泛的条件下提供单位分离质量的准确测量。