School of Chemistry and Biochemistry and Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA.
J Phys Chem B. 2009 Oct 22;113(42):13886-90. doi: 10.1021/jp907019p.
Photon trajectories from single molecule experiments can report on biomolecule structural changes and motions. Hidden Markov models (HMM) facilitate extraction of the sequence of hidden states from noisy data through construction of probabilistic models. Typically, the true number of states is determined by the Bayesian information criteria (BIC); however, constraints resulting from short data sets and Poisson-distributed photons in radiative processes like fluorescence can limit successful application of goodness-of-fit statistics. For single molecule intensity trajectories, additional information criteria such as peak localization error (LE) and chi-square probabilities can incorporate theoretical constraints on experimental data while modifying normal HMM. Chi-square minimization also serves as a stopping point of the iteration in which the system parameters are trained. Peak LE enables exclusion of overfitted and overlapped states. These constraints and criteria are tested against BIC on simulated single molecule trajectories to best identify the true number of emissive levels in any sequence.
单分子实验中的光子轨迹可以报告生物分子的结构变化和运动。隐马尔可夫模型 (HMM) 通过构建概率模型,从噪声数据中提取隐藏状态的序列。通常,真实状态的数量由贝叶斯信息准则 (BIC) 确定;然而,短数据集和辐射过程(如荧光)中泊松分布光子产生的限制会限制拟合优度统计量的成功应用。对于单分子强度轨迹,可以使用其他信息准则,如峰定位误差 (LE) 和卡方概率,在修改正常 HMM 的同时,将实验数据的理论限制纳入其中。卡方最小化也是系统参数训练迭代的停止点。峰 LE 可排除过度拟合和重叠的状态。这些约束和准则在模拟单分子轨迹上与 BIC 进行了测试,以在任何序列中最好地识别发射水平的真实数量。