Winkler David A, Burden Frank R
Centre for Complexity in Drug Discovery, CSIRO Molecular and Health Technologies, Clayton, Australia.
Methods Mol Biol. 2007;409:365-77. doi: 10.1007/978-1-60327-118-9_27.
Methods for predicting the binding affinity of peptides to the MHC have become more sophisticated in the past 5-10 years. It is possible to use computational quantitative structure-activity methods to build models of peptide affinity that are truly predictive. Two of the most useful methods for building models are Bayesian regularized neural networks for continuous or discrete (categorical) data and support vector machines (SVMs) for discrete data. We illustrate the application of Bayesian regularized neural networks to modeling MHC class II-binding affinity of peptides. Training data comprised sequences and binding data for nonamer (nine amino acid) peptides. Peptides were characterized by mathematical representations of several types. Independent test data comprised sequences and binding data for peptides of length < or = 25. We also internally validated the models by using 30% of the data in an internal test set. We obtained robust models, with near-identical statistics for multiple training runs. We determined how predictive our models were using statistical tests and area under the receiver operating characteristic (ROC) graphs (A(ROC)). Some mathematical representations of the peptides were more efficient than others and were able to generalize to unknown peptides outside of the training space. Bayesian neural networks are robust, efficient "universal approximators" that are well able to tackle the difficult problem of correctly predicting the MHC class II-binding activities of a majority of the test set peptides.
在过去5到10年里,预测肽与主要组织相容性复合体(MHC)结合亲和力的方法变得更加成熟。使用计算定量构效方法来构建真正具有预测性的肽亲和力模型成为可能。构建模型最有用的两种方法是用于连续或离散(分类)数据的贝叶斯正则化神经网络以及用于离散数据的支持向量机(SVM)。我们阐述了贝叶斯正则化神经网络在肽的MHC II类结合亲和力建模中的应用。训练数据包括九聚体(九个氨基酸)肽的序列和结合数据。肽通过几种类型的数学表示进行表征。独立测试数据包括长度小于或等于25的肽的序列和结合数据。我们还通过在内部测试集中使用30%的数据对模型进行了内部验证。我们获得了稳健的模型,并在多次训练运行中得到了几乎相同的统计数据。我们使用统计测试和受试者操作特征(ROC)曲线下面积(A(ROC))来确定我们的模型具有怎样的预测能力。肽的一些数学表示比其他表示更有效,并且能够推广到训练空间之外的未知肽。贝叶斯神经网络是稳健、高效的“通用逼近器”,能够很好地解决正确预测大多数测试集肽的MHC II类结合活性这一难题。