Saranti Margarita, Neville Douglas, White Adam, Rotshtein Pia, Hope Thomas M H, Price Cathy J, Bowman Howard
School of Psychology, University of Birmingham, UK.
Department of Imaging Neuroscience, University College London, UK.
Neuroimage Clin. 2025 Aug 6;48:103858. doi: 10.1016/j.nicl.2025.103858.
Accurate prediction of post-stroke language outcomes using machine learning offers the potential to enhance clinical treatment and rehabilitation for aphasic patients. This study of 758 English speaking stroke patients from the PLORAS project explores the impact of sample size on the performance of logistic regression and a deep learning (ResNet-18) model in predicting language outcomes from neuroimaging and impairment-relevant tabular data. We assessed the performance of both models on two key language tasks from the Comprehensive Aphasia Test: Spoken Picture Description and Naming, using a learning curve approach. Contrary to expectations, the simpler logistic regression model performed comparably or better than the deep learning model (with overlapping confidence intervals), with both models showing an accuracy plateau around 80% for sample sizes larger than 300 patients. Principal Component Analysis revealed that the dimensionality of the neuroimaging data could be reduced to as few as 20 (or even 2) dominant components without significant loss in accuracy, suggesting that classification may be driven by simple patterns such as lesion size. The study highlights both the potential limitations of current dataset size in achieving further accuracy gains and the need for larger datasets to capture more complex patterns, as some of our results indicate that we might not have reached an absolute classification performance ceiling. Overall, these findings provide insights into the practical use of machine learning for predicting aphasia outcomes and the potential benefits of much larger datasets in enhancing model performance.
使用机器学习准确预测中风后的语言结果,为增强失语症患者的临床治疗和康复提供了潜力。这项对来自PLORAS项目的758名说英语的中风患者的研究,探讨了样本量对逻辑回归和深度学习(ResNet - 18)模型在根据神经影像和与损伤相关的表格数据预测语言结果时性能的影响。我们使用学习曲线方法,评估了这两种模型在综合失语症测试的两项关键语言任务上的表现:口语图片描述和命名。与预期相反,更简单的逻辑回归模型表现得与深度学习模型相当或更好(置信区间重叠),对于样本量大于300名患者的情况,两种模型的准确率都在80%左右达到平稳状态。主成分分析表明,神经影像数据的维度可以减少到低至20个(甚至2个)主成分,而不会在准确性上有显著损失,这表明分类可能由诸如病变大小等简单模式驱动。该研究突出了当前数据集大小在进一步提高准确性方面的潜在局限性,以及需要更大的数据集来捕捉更复杂模式的必要性,因为我们的一些结果表明我们可能尚未达到绝对的分类性能上限。总体而言,这些发现为机器学习在预测失语症结果方面的实际应用以及更大数据集在提高模型性能方面的潜在益处提供了见解。