Zhong Yuhua
School of Science, Sun Yat-sen University, Shenzhen, 518307, China.
Sci Rep. 2025 Jul 3;15(1):23788. doi: 10.1038/s41598-025-08882-7.
Basketball remains among the most globally popular sports, with its various competitions drawing substantial attention. The analysis and modeling of basketball game data have long been central topics in sports analytics. In recent years, integrating machine learning techniques has facilitated significant advancements in predicting basketball game outcomes. However, most existing studies predominantly focus on NBA data, with relatively limited exploration of other leagues. To address this research gap, this study utilizes game data from the Chinese Basketball Association spanning the 2021-2024 seasons to develop predictive models. This research is the first to apply the classical Four Factors model and DefenseOfense model, along with their derivative versions (Four Factors detailed model and DefenseOfense detailed model), to the Chinese Men's Professional Basketball League, providing a baseline for prediction. To ensure practical applicability of the models and enable their effective use in real-world scenarios, this study exclusively uses data available before the start of each game as feature variables for training. This approach ensures that the enhanced models can perform well in theoretical evaluations and provide reliable predictions when applied in practice. To evaluate model performance, a diverse set of machine learning algorithms, including support vector machines, Naive Bayes, k-nearest neighbors, logistic regression, multi-layer perceptron with contrastive loss, and XGBoost are employed, with metrics such as Accuracy, F1 Score, Recall, Precision, and AUROC used for comparison. The results reveal that the incorporation of additional features substantially enhances predictive performance. In particular, under the Logistic Regression framework, the newly developed model based on the Four Factors detailed achieves an accuracy of 85.49%, representing the highest predictive performance among all the evaluated approaches.
篮球仍然是全球最受欢迎的运动之一,其各类比赛吸引了大量关注。篮球比赛数据的分析与建模长期以来一直是体育分析中的核心话题。近年来,整合机器学习技术推动了篮球比赛结果预测方面的显著进展。然而,大多数现有研究主要聚焦于NBA数据,对其他联赛的探索相对有限。为填补这一研究空白,本研究利用中国篮球协会2021 - 2024赛季的比赛数据来开发预测模型。本研究首次将经典的四因素模型和攻防模型及其衍生版本(四因素详细模型和攻防详细模型)应用于中国男子职业篮球联赛,提供了一个预测基线。为确保模型的实际适用性并使其能在现实场景中有效应用,本研究仅使用每场比赛开始前可用的数据作为训练的特征变量。这种方法确保了增强后的模型在理论评估中表现良好,并在实际应用时能提供可靠的预测。为评估模型性能,采用了多种机器学习算法,包括支持向量机、朴素贝叶斯、k近邻、逻辑回归、带有对比损失的多层感知器以及XGBoost,并使用准确率、F1分数、召回率、精确率和曲线下面积等指标进行比较。结果表明,纳入额外特征显著提高了预测性能。特别是在逻辑回归框架下,基于四因素详细模型新开发的模型准确率达到85.49%,在所有评估方法中表现出最高的预测性能。