Bandyopadhyay Argha, Goldschen-Ohm Marcel P
Department of Neuroscience, University of Texas at Austin, Austin, Texas.
Department of Neuroscience, University of Texas at Austin, Austin, Texas.
Biophys J. 2021 Oct 19;120(20):4472-4483. doi: 10.1016/j.bpj.2021.08.045. Epub 2021 Sep 4.
Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC's performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC's performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows.
单分子(SM)方法为许多生物物理系统提供了有价值的机制信息。随着技术进步带来越来越大的数据集,用于快速分析和识别表现出感兴趣行为的分子的工具变得越来越重要。在许多情况下,潜在机制尚不清楚,这使得无监督技术成为理想选择。分裂分割与聚类(DISC)算法就是这样一种无监督方法,它能以比计算密集型方法快得多的速度对有噪声的单分子时间序列进行理想化处理,同时不牺牲准确性。然而,DISC依赖于用户选择的目标准则(OC)来指导其对理想时间序列的估计。在这里,我们探讨不同的目标准则如何影响DISC在单分子荧光成像实验典型数据上的性能。我们发现,对模型复杂性惩罚不同的目标准则,各自针对具有不同特性(如信号/噪声和采样点数)的时间序列优化了DISC的性能。使用机器学习方法,我们生成了一个决策边界,允许基于输入时间序列无监督地选择目标准则,以针对不同类型的数据最大化性能。这对于单分子荧光数据集尤为重要,因为这些数据集的信号/噪声通常接近导出的决策边界,并且由于随机漂白而包含长度不均匀的时间序列。我们的方法AutoDISC允许对DISC进行无监督的单分子优化,这将极大地有助于对具有噪声样本和不均匀时间窗口的高通量单分子数据集进行快速分析。