Turrisi Rosanna, Verri Alessandro, Barla Annalisa
Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, Genoa, Italy.
Machine Learning Genoa (MaLGa) Center, University of Genoa, Genoa, Italy.
Front Comput Neurosci. 2024 Sep 20;18:1360095. doi: 10.3389/fncom.2024.1360095. eCollection 2024.
Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices in , and is crucial. In this work, we summarize and strictly adhere to such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, a challenging problem in healthcare. Additionally, we investigate the impact of modeling choices, including different data augmentation techniques and model complexity, on overall performance.
We utilize Magnetic Resonance Imaging (MRI) data from the ADNI corpus to address a binary classification problem using 3D Convolutional Neural Networks (CNNs). Data processing and modeling are specifically tailored to address data scarcity and minimize computational overhead. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures with varying convolutional layers counts. The augmentation strategies involve affine transformations, such as , and , applied either concurrently or separately.
The combined effect of data augmentation and model complexity results in up to 10% variation in prediction accuracy. Notably, when affine transformation are applied separately, the model achieves higher accuracy, regardless the chosen architecture. Across all strategies, the model accuracy exhibits a concave behavior as the number of convolutional layers increases, peaking at an intermediate value. The best model reaches excellent performance both on the internal and additional external testing set.
Our work underscores the critical importance of adhering to rigorous experimental practices in the field of ML applied to healthcare. The results clearly demonstrate how data augmentation and model depth-often overlooked factors- can dramatically impact final performance if not thoroughly investigated. This highlights both the necessity of exploring neglected modeling aspects and the need to comprehensively report all modeling choices to ensure reproducibility and facilitate meaningful comparisons across studies.
机器学习(ML)已成为医疗保健领域一种很有前景的方法,其性能优于传统统计技术。然而,要将ML确立为临床实践中的可靠工具,遵守在 和 方面的最佳实践至关重要。在这项工作中,我们总结并严格遵守这些实践,以确保ML具有可重复性和可靠性。具体而言,我们专注于阿尔茨海默病(AD)检测,这是医疗保健领域中的一个具有挑战性的问题。此外,我们研究建模选择(包括不同的数据增强技术和模型复杂性)对整体性能的影响。
我们利用来自ADNI语料库的磁共振成像(MRI)数据,使用3D卷积神经网络(CNN)解决二元分类问题。数据处理和建模经过专门调整,以解决数据稀缺问题并最小化计算开销。在此框架内,我们训练了15个预测模型,考虑了三种不同的数据增强策略和五种具有不同卷积层数的不同3D CNN架构。增强策略涉及仿射变换,例如 和 ,可同时或单独应用。
数据增强和模型复杂性的综合效果导致预测准确率变化高达10%。值得注意的是,当单独应用仿射变换时,无论选择何种架构,模型都能达到更高的准确率。在所有策略中,随着卷积层数的增加,模型准确率呈现出凹形行为,在中间值处达到峰值。最佳模型在内部测试集和额外的外部测试集上均表现出色。
我们的工作强调了在应用于医疗保健的ML领域坚持严格实验实践的至关重要性。结果清楚地表明,如果不进行深入研究,数据增强和模型深度(通常被忽视的因素)会如何显著影响最终性能。这突出了探索被忽视的建模方面的必要性,以及全面报告所有建模选择以确保可重复性并便于跨研究进行有意义比较的必要性。