通过学习距离函数识别HLA超型。

Identifying HLA supertypes by learning distance functions.

作者信息

Hertz Tomer, Yanover Chen

机构信息

School of Computer Science and Engineering, Israel.

出版信息

Bioinformatics. 2007 Jan 15;23(2):e148-55. doi: 10.1093/Bioinformatics/btl324.

DOI:10.1093/Bioinformatics/btl324

PMID:17237084

Abstract

MOTIVATION

The development of epitope-based vaccines crucially relies on the ability to classify Human Leukocyte Antigen (HLA) molecules into sets that have similar peptide binding specificities, termed supertypes. In their seminal work, Sette and Sidney defined nine HLA class I supertypes and claimed that these provide an almost perfect coverage of the entire repertoire of HLA class I molecules. HLA alleles are highly polymorphic and polygenic and therefore experimentally classifying each of these molecules to supertypes is at present an impossible task. Recently, a number of computational methods have been proposed for this task. These methods are based on defining protein similarity measures, derived from analysis of binding peptides or from analysis of the proteins themselves.

RESULTS

In this paper we define both peptide derived and protein derived similarity measures, which are based on learning distance functions. The peptide derived measure is defined using a peptide-peptide distance function, which is learned using information about known binding and non-binding peptides. The protein derived similarity measure is defined using a protein-protein distance function, which is learned using information about alleles previously classified to supertypes by Sette and Sidney (1999). We compare the classification obtained by these two complimentary methods to previously suggested classification methods. In general, our results are in excellent agreement with the classifications suggested by Sette and Sidney (1999) and with those reported by Buus et al. (2004). The main important advantage of our proposed distance-based approach is that it makes use of two different and important immunological sources of information-HLA alleles and peptides that are known to bind or not bind to these alleles. Since each of our distance measures is trained using a different source of information, their combination can provide a more confident classification of alleles to supertypes.

摘要

动机

基于表位的疫苗开发关键依赖于将人类白细胞抗原（HLA）分子分类为具有相似肽结合特异性的集合（称为超级类型）的能力。在其开创性工作中，塞特和西德尼定义了九种HLA I类超级类型，并声称这些超级类型几乎完美覆盖了HLA I类分子的整个库。HLA等位基因具有高度多态性和多基因性，因此目前将这些分子中的每一个实验性地分类到超级类型是一项不可能完成的任务。最近，已经提出了许多用于此任务的计算方法。这些方法基于定义从结合肽分析或蛋白质本身分析得出的蛋白质相似性度量。

结果

在本文中，我们定义了基于学习距离函数的肽衍生和蛋白质衍生相似性度量。肽衍生度量使用肽 - 肽距离函数定义，该函数使用有关已知结合和非结合肽的信息进行学习。蛋白质衍生相似性度量使用蛋白质 - 蛋白质距离函数定义，该函数使用先前由塞特和西德尼（1999年）分类到超级类型的等位基因信息进行学习。我们将这两种互补方法获得的分类与先前建议的分类方法进行比较。总体而言，我们的结果与塞特和西德尼（1999年）建议的分类以及布斯等人（2004年）报告的分类非常一致。我们提出的基于距离的方法的主要重要优势在于它利用了两种不同且重要的免疫学信息来源——HLA等位基因和已知与这些等位基因结合或不结合的肽。由于我们的每个距离度量都使用不同的信息来源进行训练，它们的组合可以为将等位基因分类到超级类型提供更可靠的分类。