Kholkine Leonid, Servotte Thomas, de Leeuw Arie-Willem, De Schepper Tom, Hellinckx Peter, Verdonck Tim, Latré Steven
Department of Computer Science, University of Antwerp-IMEC, Antwerp, Belgium.
Department of Mathematics, University of Antwerp, Antwerp, Belgium.
Front Sports Act Living. 2021 Oct 6;3:714107. doi: 10.3389/fspor.2021.714107. eCollection 2021.
Professional road cycling is a very competitive sport, and many factors influence the outcome of the race. These factors can be internal (e.g., psychological preparedness, physiological profile of the rider, and the preparedness or fitness of the rider) or external (e.g., the weather or strategy of the team) to the rider, or even completely unpredictable (e.g., crashes or mechanical failure). This variety makes perfectly predicting the outcome of a certain race an impossible task and the sport even more interesting. Nonetheless, before each race, journalists, ex-pro cyclists, websites and cycling fans try to predict the possible top 3, 5, or 10 riders. In this article, we use easily accessible data on road cycling from the past 20 years and the Machine Learning technique Learn-to-Rank (LtR) to predict the top 10 contenders for 1-day road cycling races. We accomplish this by mapping a relevancy weight to the finishing place in the first 10 positions. We assess the performance of this approach on 2018, 2019, and 2021 editions of six spring classic 1-day races. In the end, we compare the output of the framework with a mass fan prediction on the Normalized Discounted Cumulative Gain (NDCG) metric and the number of correct top 10 guesses. We found that our model, on average, has slightly higher performance on both metrics than the mass fan prediction. We also analyze which variables of our model have the most influence on the prediction of each race. This approach can give interesting insights to fans before a race but can also be helpful to sports coaches to predict how a rider might perform compared to other riders outside of the team.
职业公路自行车赛是一项竞争非常激烈的运动,许多因素会影响比赛结果。这些因素可能是车手自身内部的(例如心理准备、车手的生理特征以及车手的准备情况或体能),也可能是外部的(例如天气或车队策略),甚至完全不可预测(例如撞车或机械故障)。这种多样性使得完美预测某场比赛的结果成为一项不可能完成的任务,也让这项运动更具趣味性。尽管如此,在每场比赛前,记者、前职业自行车手、网站和自行车爱好者都会试图预测可能进入前三、前五或前十的车手。在本文中,我们使用过去20年中易于获取的公路自行车数据以及机器学习技术排序学习(LtR)来预测单日公路自行车赛的前十位竞争者。我们通过为前十位的完赛名次映射一个相关性权重来实现这一目标。我们在2018年、2019年和2021年的六场春季经典单日赛中评估了这种方法的性能。最后,我们将该框架的输出结果与大众粉丝在归一化折损累计增益(NDCG)指标上的预测以及正确猜出的前十车手数量进行了比较。我们发现,我们的模型在这两个指标上的平均表现略高于大众粉丝的预测。我们还分析了模型的哪些变量对每场比赛的预测影响最大。这种方法可以在比赛前为粉丝提供有趣的见解,也有助于体育教练预测一名车手与车队外其他车手相比可能的表现。