Institute of Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94304;
Oxford Vaccine Group, Department of Pediatrics, University of Oxford, Oxford OX3 9DU, United Kingdom.
J Immunol. 2019 Aug 1;203(3):749-759. doi: 10.4049/jimmunol.1900033. Epub 2019 Jun 14.
Machine learning holds considerable promise for understanding complex biological processes such as vaccine responses. Capturing interindividual variability is essential to increase the statistical power necessary for building more accurate predictive models. However, available approaches have difficulty coping with incomplete datasets which is often the case when combining studies. Additionally, there are hundreds of algorithms available and no simple way to find the optimal one. In this study, we developed Sequential Iterative Modeling "OverNight" (SIMON), an automated machine learning system that compares results from 128 different algorithms and is particularly suitable for datasets containing many missing values. We applied SIMON to data from five clinical studies of seasonal influenza vaccination. The results reveal previously unrecognized CD4 and CD8 T cell subsets strongly associated with a robust Ab response to influenza Ags. These results demonstrate that SIMON can greatly speed up the choice of analysis modalities. Hence, it is a highly useful approach for data-driven hypothesis generation from disparate clinical datasets. Our strategy could be used to gain biological insight from ever-expanding heterogeneous datasets that are publicly available.
机器学习在理解疫苗反应等复杂生物过程方面具有很大的潜力。捕捉个体间的变异性对于提高构建更准确预测模型所需的统计能力至关重要。然而,现有的方法难以处理不完整的数据集,而在组合研究时,这种情况经常发生。此外,有数百种算法可供选择,没有简单的方法可以找到最佳算法。在这项研究中,我们开发了一种名为“SIMON”的自动机器学习系统,它可以比较 128 种不同算法的结果,特别适用于包含大量缺失值的数据集。我们将 SIMON 应用于五个季节性流感疫苗接种的临床研究数据。结果揭示了以前未被识别的 CD4 和 CD8 T 细胞亚群与对流感抗原产生强大 Ab 反应强烈相关。这些结果表明,SIMON 可以大大加快分析模式的选择速度。因此,它是一种从不同临床数据集生成数据驱动假设的非常有用的方法。我们的策略可以用于从不断扩展的公共可用异质数据集获取生物学见解。