Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA.
Department of Pathology, Yale School of Medicine, New Haven, CT, USA.
Hum Vaccin Immunother. 2023 Aug 1;19(2):2251830. doi: 10.1080/21645515.2023.2251830.
Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
过拟合描述的是一种现象,即一个在训练数据上具有高度预测能力的模型在未来的观测中表现不佳。这是在将机器学习技术应用于当代医学应用时(例如预测疫苗接种反应和传染病或癌症研究中的疾病状态)的一个常见问题。本综述探讨了过拟合的原因,并提供了对抗过拟合的策略,重点是模型复杂度降低、可靠的模型评估和利用数据多样性。通过讨论基础的数学模型和使用合成数据和已发表的真实数据集的示例,我们的目标是为分析师和生物信息学家提供必要的知识和工具,以在他们的研究中检测和减轻过拟合。