Winters-Hilt Stephen, Baribault Carl
Dept. of Computer Science, University of New Orleans, New Orleans, LA 70148, USA.
BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S19. doi: 10.1186/1471-2105-8-S7-S19.
Hidden Markov Models (HMMs) provide an excellent means for structure identification and feature extraction on stochastic sequential data. An HMM-with-Duration (HMMwD) is an HMM that can also exactly model the hidden-label length (recurrence) distributions - while the regular HMM will impose a best-fit geometric distribution in its modeling/representation.
A Novel, Fast, HMM-with-Duration (HMMwD) Implementation is presented, and experimental results are shown that demonstrate its performance on two-state synthetic data designed to model Nanopore Detector Data. The HMMwD experimental results are compared to (i) the ideal model and to (ii) the conventional HMM. Its accuracy is clearly an improvement over the standard HMM, and matches that of the ideal solution in many cases where the standard HMM does not. Computationally, the new HMMwD has all the speed advantages of the conventional (simpler) HMM implementation. In preliminary work shown here, HMM feature extraction is then used to establish the first pattern recognition-informed (PRI) sampling control of a Nanopore Detector Device (on a "live" data-stream).
The improved accuracy of the new HMMwD implementation, at the same order of computational cost as the standard HMM, is an important augmentation for applications in gene structure identification and channel current analysis, especially PRI sampling control, for example, where speed is essential. The PRI experiment was designed to inherit the high accuracy of the well characterized and distinctive blockades of the DNA hairpin molecules used as controls (or blockade "test-probes"). For this test set, the accuracy inherited is 99.9%.
隐马尔可夫模型(HMM)为随机序列数据的结构识别和特征提取提供了一种出色的方法。带持续时间的HMM(HMMwD)是一种HMM,它还可以精确地对隐藏标签长度(复发)分布进行建模,而常规HMM在其建模/表示中会强加一个最佳拟合几何分布。
提出了一种新颖、快速的带持续时间的HMM(HMMwD)实现方式,并展示了实验结果,这些结果证明了其在用于模拟纳米孔检测器数据的双状态合成数据上的性能。将HMMwD的实验结果与(i)理想模型和(ii)传统HMM进行了比较。其准确性明显优于标准HMM,并且在许多标准HMM无法达到的情况下与理想解决方案相匹配。在计算方面,新的HMMwD具有传统(更简单)HMM实现方式的所有速度优势。在这里展示的初步工作中,然后使用HMM特征提取来建立对纳米孔检测器设备(在“实时”数据流上)的首次模式识别知情(PRI)采样控制。
新的HMMwD实现方式在计算成本与标准HMM相同的情况下提高了准确性,这对于基因结构识别和通道电流分析中的应用,特别是PRI采样控制(例如在速度至关重要的情况下)是一项重要的增强。PRI实验旨在继承用作对照(或阻断“测试探针”)的DNA发夹分子特征明确且独特的阻断的高精度。对于这个测试集,继承的准确性为99.9%。