Brusic V, Schönbach C, Takiguchi M, Ciesielski V, Harrison L C
Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Proc Int Conf Intell Syst Mol Biol. 1997;5:75-83.
T cells of the vertebrate immune system recognise peptides bound by major histocompatibility complex (MHC) molecules on the surface of host cells. Peptide binding to MHC molecules is necessary for immune recognition, but only a subset of peptides are capable of binding to a particular MHC molecule. Common amino acid patterns (binding motifs) have been observed in sets of peptides that bind to specific MHC molecules. Recently, matrix models for peptide/MHC interaction have been reported. These encode the rules of peptide/ MHC interactions for an individual MHC molecule as a 20 x 9 matrix where the contribution to binding of each amino acid at each position within a 9-mer peptide is quantified. The artificial intelligence techniques of genetic search and machine learning have proved to be very useful in the area of biological sequence analysis. The availability of peptide/MHC binding data can facilitate derivation of binding matrices using machine learning techniques. We performed a simulation study to determine the minimum number of peptide samples required to derive matrices, given the pre-defined accuracy of the matrix model. The matrices were derived using a genetic search. In addition, matrices for peptide binding to the human class I MHC molecules, HLA-B35 and -A24, were derived, validated by independent experimental data and compared to previously-reported matrices. The results indicate that at least 150 peptide samples are required to derive matrices of acceptable accuracy. This result is based on a maximum noise content of 5%, the availability of precise affinity measurements and that acceptable accuracy is determined by an area under the Relative Operating Characteristic curve (Aroc) of > 0.8. More than 600 peptide samples are required to derive matrices of excellent accuracy (Aroc > 0.9). Finally, we derived a human HLA-B27 binding matrix using a genetic search and 404 experimentally-tested peptides, and estimated its accuracy at Aroc > 0.88. The results of this study are expected to be of practical interest to immunologists for efficient identification of peptides as candidates for immunotherapy.
脊椎动物免疫系统的T细胞识别宿主细胞表面主要组织相容性复合体(MHC)分子所结合的肽段。肽段与MHC分子的结合是免疫识别所必需的,但只有一部分肽段能够与特定的MHC分子结合。在与特定MHC分子结合的肽段集合中观察到了常见的氨基酸模式(结合基序)。最近,已经报道了肽段/MHC相互作用的矩阵模型。这些模型将单个MHC分子的肽段/MHC相互作用规则编码为一个20×9的矩阵,其中对9聚体肽段内每个位置的每种氨基酸的结合贡献进行了量化。遗传搜索和机器学习等人工智能技术在生物序列分析领域已被证明非常有用。肽段/MHC结合数据的可用性有助于使用机器学习技术推导结合矩阵。我们进行了一项模拟研究,以确定在给定矩阵模型预定义准确性的情况下,推导矩阵所需的最少肽段样本数量。这些矩阵是通过遗传搜索推导出来的。此外,还推导了与人类I类MHC分子HLA - B35和 - A24结合的肽段矩阵,通过独立实验数据进行验证,并与先前报道的矩阵进行比较。结果表明,至少需要150个肽段样本才能推导出具有可接受准确性的矩阵。该结果基于最大噪声含量为5%、精确亲和力测量的可用性,并且可接受的准确性由相对操作特征曲线(Aroc)下的面积> 0.8来确定。需要超过600个肽段样本才能推导出具有优异准确性(Aroc> 0.9)的矩阵。最后,我们使用遗传搜索和404个经过实验测试的肽段推导出了人类HLA - B27结合矩阵,并估计其准确性为Aroc> 0.88。预计这项研究的结果将对免疫学家有效识别作为免疫治疗候选物的肽段具有实际意义。