Sgouralis Ioannis, Pressé Steve
Department of Physics, Arizona State University, Tempe, Arizona.
Department of Physics, Arizona State University, Tempe, Arizona; School of Molecular Sciences, Arizona State University, Tempe, Arizona.
Biophys J. 2017 May 23;112(10):2117-2126. doi: 10.1016/j.bpj.2017.04.009.
Bayesian nonparametric methods have recently transformed emerging areas within data science. One such promising method, the infinite hidden Markov model (iHMM), generalizes the HMM that itself has become a workhorse in single molecule data analysis. The iHMM goes beyond the HMM by self-consistently learning all parameters learned by the HMM in addition to learning the number of states without recourse to any model selection steps. Despite its generality, simple features (such as drift), common to single molecule time traces, result in an overinterpretation of drift and the introduction of artifact states. Here we present an adaptation of the iHMM that can treat data with drift originating from one or many traces (e.g., Förster resonance energy transfer). Our fully Bayesian method couples the iHMM to a continuous control process (drift) self-consistently learned while learning all other quantities determined by the iHMM (including state numbers). A key advantage of this method is that all traces-regardless of drift or states visited across traces-may now be treated on an equal footing, thereby eliminating user-dependent trace selection (based on drift levels), preprocessing to remove drift, and postprocessing model selection based on state number.
贝叶斯非参数方法最近改变了数据科学中的新兴领域。一种这样有前景的方法,即无限隐马尔可夫模型(iHMM),对本身已成为单分子数据分析中主力工具的隐马尔可夫模型(HMM)进行了推广。iHMM超越了HMM,它通过自洽地学习HMM所学习的所有参数,此外还能在无需任何模型选择步骤的情况下学习状态数量。尽管具有通用性,但单分子时间轨迹中常见的简单特征(如漂移)会导致对漂移的过度解读以及人为状态的引入。在此,我们提出一种iHMM的改进方法,它能够处理源自一条或多条轨迹(例如,荧光共振能量转移)的带有漂移的数据。我们的全贝叶斯方法将iHMM与一个连续控制过程(漂移)自洽地结合起来,该过程在学习由iHMM确定的所有其他量(包括状态数量)时进行学习。此方法的一个关键优势在于,现在所有轨迹——无论有无漂移或跨轨迹访问的状态如何——都可以在平等的基础上进行处理,从而消除了依赖用户的轨迹选择(基于漂移水平)、去除漂移的预处理以及基于状态数量的后处理模型选择。