Department of Chemistry, Columbia University, New York, New York, USA.
Biophys J. 2009 Dec 16;97(12):3196-205. doi: 10.1016/j.bpj.2009.09.031.
Time series data provided by single-molecule Förster resonance energy transfer (smFRET) experiments offer the opportunity to infer not only model parameters describing molecular complexes, e.g., rate constants, but also information about the model itself, e.g., the number of conformational states. Resolving whether such states exist or how many of them exist requires a careful approach to the problem of model selection, here meaning discrimination among models with differing numbers of states. The most straightforward approach to model selection generalizes the common idea of maximum likelihood--selecting the most likely parameter values--to maximum evidence: selecting the most likely model. In either case, such an inference presents a tremendous computational challenge, which we here address by exploiting an approximation technique termed variational Bayesian expectation maximization. We demonstrate how this technique can be applied to temporal data such as smFRET time series; show superior statistical consistency relative to the maximum likelihood approach; compare its performance on smFRET data generated from experiments on the ribosome; and illustrate how model selection in such probabilistic or generative modeling can facilitate analysis of closely related temporal data currently prevalent in biophysics. Source code used in this analysis, including a graphical user interface, is available open source via http://vbFRET.sourceforge.net.
时间序列数据由单分子Förster 共振能量转移(smFRET)实验提供,不仅有机会推断描述分子复合物的模型参数,例如速率常数,还可以推断模型本身的信息,例如构象状态的数量。要确定是否存在这样的状态以及存在多少状态,需要仔细考虑模型选择问题,这里是指区分具有不同状态数量的模型。模型选择的最直接方法是将最大似然的常见思想(选择最可能的参数值)推广到最大证据:选择最可能的模型。在这两种情况下,这种推断都提出了巨大的计算挑战,我们通过利用一种称为变分贝叶斯期望最大化的近似技术来解决这个问题。我们展示了如何将该技术应用于 smFRET 时间序列等时间数据;与最大似然方法相比,表现出更高的统计一致性;比较它在核糖体实验生成的 smFRET 数据上的性能;并说明这种概率或生成模型中的模型选择如何有助于分析生物物理学中当前流行的密切相关的时间数据。该分析中使用的源代码,包括图形用户界面,可通过 http://vbFRET.sourceforge.net 以开源方式获得。