Udaka Keiko, Mamitsuka Hiroshi, Nakaseko Yukinobu, Abe Naoki
Department of Biophysics, Kyoto University, Japan.
J Biol Phys. 2002 Jun;28(2):183-94. doi: 10.1023/A:1019931731519.
A query learning algorithm based on hidden Markov models (HMMs) isdeveloped to design experiments for string analysis and prediction of MHCclass I binding peptides. Query learning is introduced to aim at reducingthe number of peptide binding data for training of HMMs. A multiple numberof HMMs, which will collectively serve as a committee, are trained withbinding data and used for prediction in real-number values. The universeof peptides is randomly sampled and subjected to judgement by the HMMs.Peptides whose prediction is least consistent among committee HMMs aretested by experiment. By iterating the feedback cycle of computationalanalysis and experiment the most wanted information is effectivelyextracted. After 7 rounds of active learning with 181 peptides in all,predictive performance of the algorithm surpassed the so far bestperforming matrix based prediction. Moreover, by combining the bothmethods binder peptides (log Kd < -6) could be predicted with84% accuracy. Parameter distribution of the HMMs that can be inspectedvisually after training further offers a glimpse of dynamic specificity ofthe MHC molecules.
开发了一种基于隐马尔可夫模型(HMM)的查询学习算法,用于设计实验以进行MHC I类结合肽的序列分析和预测。引入查询学习的目的是减少用于训练HMM的肽结合数据数量。使用多个HMM(它们将共同组成一个委员会)进行结合数据训练,并用于实数值预测。肽的全集被随机采样,并由HMM进行判断。在委员会HMM中预测一致性最低的肽通过实验进行测试。通过迭代计算分析和实验的反馈循环,有效地提取了最需要的信息。在总共对181个肽进行7轮主动学习后,该算法的预测性能超过了迄今为止表现最佳的基于矩阵的预测方法。此外,通过结合这两种方法,可以以84%的准确率预测结合肽(log Kd < -6)。训练后可以直观检查的HMM参数分布进一步提供了对MHC分子动态特异性的一瞥。