Nielsen Morten, Lundegaard Claus, Worning Peder, Hvid Christina Sylvester, Lamberth Kasper, Buus Søren, Brunak Søren, Lund Ole
Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark.
Bioinformatics. 2004 Jun 12;20(9):1388-97. doi: 10.1093/bioinformatics/bth100. Epub 2004 Feb 12.
Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design.
We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
预测哪些肽段能够结合特定的主要组织相容性复合体(MHC)是识别适合作为疫苗候选物的潜在T细胞表位的重要一步。II类MHC结合肽具有广泛的长度分布,这使得此类预测变得复杂。因此,确定正确的比对是识别II类MHC结合基序核心的关键部分。在此背景下,我们希望描述一种新型的吉布斯基序采样器方法,该方法非常适合识别此类弱序列基序。该方法基于吉布斯采样方法,并结合了针对识别I类和II类MHC结合基序任务进行优化的新特性。该方法在一组序列中定位结合基序,并根据权重矩阵对基序进行表征。随后,权重矩阵可用于有效识别潜在的MHC结合肽,并指导合理疫苗设计过程。
我们将基序采样器方法应用于II类MHC结合这一复杂问题。该方法的输入是从SYFPEITHI和MHCPEP公共数据库中提取的已知能结合II类MHC复合体HLA - DR4(B1 * 0401)的氨基酸肽序列。结果表明,事先识别结合基序中信息丰富(锚定)的位置可提高吉布斯采样器的预测性能。同样,从次优解的总体平均中获得的共识解表现优于使用单个最优解。在大规模基准计算中,使用相对操作特征曲线(ROC)图对性能进行量化,并将性能与TEPITOPE方法以及使用ClustalW传统比对算法推导的权重矩阵进行详细比较。计算表明,吉布斯采样器的预测性能高于ClustalW,并且在大多数情况下也高于TEPITOPE方法。