Department of Statistics and Operational Research, University Nacional Educación a Distancia, Paseo Senda del Rey 9, 28040 Madrid, Spain.
Center for Vaccine Development, Departments of Pediatrics and Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
J Biomed Inform. 2017 Oct;74:1-9. doi: 10.1016/j.jbi.2017.08.005. Epub 2017 Aug 9.
Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model.
The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models.
The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method.
Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines.
免疫保护相关性在疫苗开发中很重要,因为它们可以深入了解保护机制,有助于确定有前途的疫苗候选物,并作为桥接临床疫苗研究的终点。我们的目标是开发一种使用志贺氏菌挑战作为模型来识别免疫保护相关性的方法。
所提出的方法利用随机森林(RF)机器学习算法以及分类和回归树(CART)来检测预测保护的免疫标志物,识别变量之间的相互作用,并定义最佳截止值。逻辑回归建模用于估计保护的概率,并且通过对逻辑回归模型进行引导来计算保护概率的置信区间(CI)。
结果表明,分类和回归树与随机森林的结合补充了标准逻辑回归,并揭示了微妙的免疫相互作用。在挑战当天血液中免疫球蛋白 IgG 抗体的特定水平预测了 75%(95%CI 67-86)的保护。在没有达到或超过定义阈值的 IgG 抗体的那些受试者中,如果 IgA 抗体分泌细胞超过定义的阈值,则 100%受到保护。与仅应用具有标准 Akaike 信息准则的逻辑回归建模进行比较的结果表明了该方法的有用性。
鉴于免疫系统的复杂性,机器学习方法的使用可能会增强传统的统计方法。当一起使用时,它们提供了一种量化重要免疫保护相关性的新方法,这可能有助于疫苗的开发。