Department of Computer Science and Operations Research, Université de Montréal, Canada.
Department of Computer Science and Operations Research, Université de Montréal, Canada; Mila - Quebec AI Institute, Université de Montréal, Canada.
J Biomech. 2023 Jun;154:111606. doi: 10.1016/j.jbiomech.2023.111606. Epub 2023 Apr 30.
Clinical datasets often comprise multiple data points or trials sampled from a single participant. When these datasets are used to train machine learning models, the method used to extract train and test sets must be carefully chosen. Using the standard machine learning approach (random-wise split), different trials from the same participant may appear in both training and test sets. This has led to schemes capable of segregating data points from a same participant into a single set (subject-wise split). Past investigations have demonstrated that models trained in this manner underperform compared to those trained using random-split schemes. Additional training of models via a small subset of trials, known as calibration, bridges the gap in performance across split schemes; however, the amount of calibration trials required to achieve strong model performance is unclear. Thus, this study aims to investigate the relationship between calibration training set size and prediction accuracy on the calibration test set. A database of 30 young, healthy adults performing multiple walking trials across nine different surfaces while fit with inertial measurement unit sensors on the lower limbs was used to develop a deep-learning classifier. For subject-wise trained models, calibration on a single gait cycle per surface yielded a 70% increase in F1-score, the harmonic mean of precision and recall, while 10 gait cycles per surface were sufficient to match the performance of a random-wise trained model. Code to generate calibration curves may be found at (https://github.com/GuillaumeLam/PaCalC).
临床数据集通常包含从单个参与者中采样的多个数据点或试验。当这些数据集用于训练机器学习模型时,必须仔细选择提取训练集和测试集的方法。使用标准的机器学习方法(随机分割),同一参与者的不同试验可能同时出现在训练集和测试集中。这导致了能够将来自同一参与者的数据点分割到单个集合中的方案(按个体分割)。过去的研究表明,以这种方式训练的模型的性能不如使用随机分割方案训练的模型。通过一小部分试验(称为校准)对模型进行额外训练,可以弥合分割方案之间的性能差距;但是,实现强模型性能所需的校准试验数量尚不清楚。因此,本研究旨在研究校准训练集大小与校准测试集上的预测准确性之间的关系。使用一个数据库,该数据库包含 30 名年轻健康的成年人在下肢装有惯性测量单元传感器的情况下在九种不同表面上进行多次行走试验。为了开发深度学习分类器,对按个体训练的模型进行校准,每个表面校准单个步态周期可将 F1 得分(精度和召回率的调和平均值)提高 70%,而每个表面校准 10 个步态周期就足以匹配随机训练模型的性能。生成校准曲线的代码可以在(https://github.com/GuillaumeLam/PaCalC)找到。