Bremel Robert D, Homan E Jane
ioGenetics LLC, 3591 Anderson Street, Madison, WI 53704, USA.
Immunome Res. 2010 Nov 2;6:7. doi: 10.1186/1745-7580-6-7.
Operation of the immune system is multivariate. Reduction of the dimensionality is essential to facilitate understanding of this complex biological system. One multi-dimensional facet of the immune system is the binding of epitopes to the MHC-I and MHC-II molecules by diverse populations of individuals. Prediction of such epitope binding is critical and several immunoinformatic strategies utilizing amino acid substitution matrices have been designed to develop predictive algorithms. Contemporaneously, computational and statistical tools have evolved to handle multivariate and megavariate analysis, but these have not been systematically deployed in prediction of MHC binding. Partial least squares analysis, principal component analysis, and associated regression techniques have become the norm in handling complex datasets in many fields. Over two decades ago Wold and colleagues showed that principal components of amino acids could be used to predict peptide binding to cellular receptors. We have applied this observation to the analysis of MHC binding, and to derivation of predictive methods applicable on a whole proteome scale.
We show that amino acid principal components and partial least squares approaches can be utilized to visualize the underlying physicochemical properties of the MHC binding domain by using commercially available software. We further show the application of amino acid principal components to develop both linear partial least squares and non-linear neural network regression prediction algorithms for MHC-I and MHC-II molecules. Several visualization options for the output aid in understanding the underlying physicochemical properties, enable confirmation of earlier work on the relative importance of certain peptide residues to MHC binding, and also provide new insights into differences among MHC molecules. We compared both the linear and non-linear MHC binding prediction tools to several predictive tools currently available on the Internet.
As opposed to the highly constrained user-interaction paradigms of web-server approaches, local computational approaches enable interactive analysis and visualization of complex multidimensional data using robust mathematical tools. Our work shows that prediction tools such as these can be constructed on the widely available JMP® platform, can operate in a spreadsheet environment on a desktop computer, and are capable of handling proteome-scale analysis with high throughput.
免疫系统的运作是多变量的。降低维度对于促进对这个复杂生物系统的理解至关重要。免疫系统的一个多维度方面是不同个体群体的表位与MHC - I和MHC - II分子的结合。预测这种表位结合至关重要,并且已经设计了几种利用氨基酸替换矩阵的免疫信息学策略来开发预测算法。同时,计算和统计工具已经发展到可以处理多变量和超大变量分析,但这些工具尚未系统地应用于MHC结合的预测。偏最小二乘法分析、主成分分析及相关回归技术已成为许多领域处理复杂数据集的标准方法。二十多年前,沃尔德及其同事表明氨基酸的主成分可用于预测肽与细胞受体的结合。我们已将这一观察结果应用于MHC结合分析,并推导出适用于全蛋白质组规模的预测方法。
我们表明,通过使用市售软件,氨基酸主成分和偏最小二乘法可用于可视化MHC结合域的潜在物理化学性质。我们进一步展示了氨基酸主成分在开发用于MHC - I和MHC - II分子的线性偏最小二乘法和非线性神经网络回归预测算法方面的应用。输出的几种可视化选项有助于理解潜在的物理化学性质,能够证实早期关于某些肽残基对MHC结合相对重要性的研究工作,还能为MHC分子之间的差异提供新见解。我们将线性和非线性MHC结合预测工具与目前互联网上可用的几种预测工具进行了比较。
与网络服务器方法高度受限的用户交互范式不同,本地计算方法能够使用强大的数学工具对复杂的多维数据进行交互式分析和可视化。我们的工作表明,这样的预测工具可以在广泛使用的JMP®平台上构建,可以在台式计算机的电子表格环境中运行,并且能够以高通量处理蛋白质组规模的分析。