Bron Esther E, Smits Marion, van der Flier Wiesje M, Vrenken Hugo, Barkhof Frederik, Scheltens Philip, Papma Janne M, Steketee Rebecca M E, Méndez Orellana Carolina, Meijboom Rozanna, Pinto Madalena, Meireles Joana R, Garrett Carolina, Bastos-Leite António J, Abdulkadir Ahmed, Ronneberger Olaf, Amoroso Nicola, Bellotti Roberto, Cárdenas-Peña David, Álvarez-Meza Andrés M, Dolph Chester V, Iftekharuddin Khan M, Eskildsen Simon F, Coupé Pierrick, Fonov Vladimir S, Franke Katja, Gaser Christian, Ledig Christian, Guerrero Ricardo, Tong Tong, Gray Katherine R, Moradi Elaheh, Tohka Jussi, Routier Alexandre, Durrleman Stanley, Sarica Alessia, Di Fatta Giuseppe, Sensi Francesco, Chincarini Andrea, Smith Garry M, Stoyanov Zhivko V, Sørensen Lauge, Nielsen Mads, Tangaro Sabina, Inglese Paolo, Wachinger Christian, Reuter Martin, van Swieten John C, Niessen Wiro J, Klein Stefan
Biomedical Imaging Group Rotterdam, Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands; Biomedical Imaging Group Rotterdam, Department of Radiology, Erasmus MC, Rotterdam, The Netherlands.
Department of Radiology, Erasmus MC, Rotterdam, The Netherlands.
Neuroimage. 2015 May 1;111:562-79. doi: 10.1016/j.neuroimage.2015.01.048. Epub 2015 Jan 31.
Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimer's disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n=30) and optionally on data from other sources (e.g., the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.
基于结构磁共振成像的痴呆症计算机辅助诊断算法在文献中已显示出高性能,但由于使用了不同的数据集和方法进行评估,因此难以进行比较。此外,尚不清楚这些算法在以前未见过的数据上的表现如何,因此,在没有实际机会使算法适应当前数据的情况下,它们在临床实践中的表现又会如何。为了解决这些可比性、通用性和临床适用性问题,我们组织了一项重大挑战,旨在基于具有临床代表性的多中心数据集客观地比较算法。以临床实践为出发点,目标是重现临床诊断。因此,我们评估了用于对三个诊断组进行多类分类的算法:可能患有阿尔茨海默病的患者、轻度认知障碍患者和健康对照者。基于临床标准的诊断被用作参考标准,尽管它有已知的局限性,但仍是现有的最佳参考。为了进行评估,使用了一个以前未见过的测试集,该测试集由354次T1加权磁共振成像扫描组成,诊断结果是盲态的。15个研究团队参与,共提交了29种算法。这些算法在一个小的训练集(n = 30)上进行训练,也可选择在来自其他来源的数据(例如,阿尔茨海默病神经影像学倡议、澳大利亚影像生物标志物和衰老生活方式旗舰研究)上进行训练。表现最佳的算法的准确率为63.0%,受试者操作特征曲线下面积(AUC)为78.8%。一般来说,使用基于体素形态学的特征提取或包括体积、皮质厚度、形状和强度在内的特征组合可实现最佳性能。该挑战通过基于网络的框架http://caddementia.grand-challenge.org接受新的提交。