Bhalerao Gaurav, Gillis Grace, Dembele Mohamed, Suri Sana, Ebmeier Klaus, Klein Johannes, Hu Michele, Mackay Clare, Griffanti Ludovica
Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford, United Kingdom.
Department of Psychiatry, University of Oxford, Oxford, United Kingdom.
Imaging Neurosci (Camb). 2025 May 28;3. doi: 10.1162/IMAG.a.4. eCollection 2025.
T1-weighted (T1w) MRI is widely used in clinical neuroimaging for studying brain structure and its changes, including those related to neurodegenerative diseases, and as anatomical reference for analysing other modalities. Ensuring high-quality T1w scans is vital as image quality affects reliability of outcome measures. However, visual inspection can be subjective and time consuming, especially with large datasets. The effectiveness of automated quality control (QC) tools for clinical cohorts remains uncertain. In this study, we used T1w scans from elderly participants within ageing and clinical populations to test the accuracy of existing QC tools with respect to visual QC and to establish a new quality prediction framework for clinical research use. Four datasets acquired from multiple scanners and sites were used (= 2438, 11 sites, 39 scanner manufacturer models, 3 field strengths-1.5T, 3T, 2.9T, patients and controls, average age 71 ± 8 years). All structural T1w scans were processed with two standard automated QC pipelines (MRIQC and CAT12). The agreement of the accept-reject ratings was compared between the automated pipelines and with visual QC. We then designed a quality prediction framework that combines the QC measures from the existing automated tools and is trained on clinical research datasets. We tested the classifier performance using cross-validation on data from all sites together, also examining the performance across diagnostic groups. We then tested the generalisability of our approach when leaving one site out and explored how well our approach generalises to data from a different scanner manufacturer and/or field strength from those used for training, as well as on an unseen new dataset of healthy young participants with movement-related artefacts. Our results show significant agreement between automated QC tools and visual QC (Kappa = 0.30 with MRIQC predictions; Kappa = 0.28 with CAT12's rating) when considering the entire dataset, but the agreement was highly variable across datasets. Our proposed robust undersampling boost (RUS) classifier achieved 87.7% balanced accuracy on the test data combined from different sites (with 86.6% and 88.3% balanced accuracy on scans from patients and controls, respectively). This classifier was also found to be generalisable on different combinations of training and test datasets (average balanced accuracy of leave-one-site-out = 78.2%; exploratory models on field strengths and manufacturers = 77.7%; movement-related artefact dataset when including 1% scans in the training = 88.5%). While existing QC tools may not be robustly applicable to datasets comprising older adults, they produce quality metrics that can be leveraged to train more robust quality control classifiers for ageing and clinical cohorts.
T1加权(T1w)磁共振成像(MRI)在临床神经影像学中被广泛用于研究脑结构及其变化,包括与神经退行性疾病相关的变化,并且作为分析其他模态的解剖学参考。确保高质量的T1w扫描至关重要,因为图像质量会影响结果测量的可靠性。然而,目视检查可能具有主观性且耗时,尤其是对于大型数据集。用于临床队列的自动化质量控制(QC)工具的有效性仍然不确定。在本研究中,我们使用了来自老龄化和临床人群中老年人参与者的T1w扫描,以测试现有QC工具相对于目视QC的准确性,并建立一个用于临床研究的新质量预测框架。使用了从多个扫描仪和站点获取的四个数据集(= 2438例,11个站点,39种扫描仪制造商型号,3种场强 - 1.5T、3T、2.9T,患者和对照,平均年龄71±8岁)。所有结构性T1w扫描均使用两种标准自动化QC流程(MRIQC和CAT12)进行处理。比较了自动化流程之间以及与目视QC的接受 - 拒绝评级的一致性。然后,我们设计了一个质量预测框架,该框架结合了现有自动化工具的QC测量,并在临床研究数据集上进行训练。我们使用所有站点的数据进行交叉验证来测试分类器性能,同时也检查了不同诊断组的性能。然后,我们在留出一个站点的情况下测试了我们方法的通用性,并探索了我们的方法在推广到来自与用于训练的不同扫描仪制造商和/或场强的数据时的效果,以及在一个包含与运动相关伪影的健康年轻参与者的未见新数据集上的效果。我们的结果表明,在考虑整个数据集时,自动化QC工具与目视QC之间存在显著一致性(MRIQC预测的Kappa = 0.30;CAT12评级的Kappa = 0.28),但不同数据集之间的一致性差异很大。我们提出的稳健欠采样增强(RUS)分类器在来自不同站点组合的测试数据上实现了87.7%的平衡准确率(患者扫描的平衡准确率为86.6%,对照扫描的平衡准确率为88.3%)。还发现该分类器在不同的训练和测试数据集组合上具有通用性(留一站点法的平均平衡准确率 = 78.2%;场强和制造商的探索性模型 = 77.7%;在训练中包含1%扫描时的与运动相关伪影数据集 = 88.5%)。虽然现有的QC工具可能无法稳健地应用于包含老年人的数据集,但它们产生的质量指标可用于训练针对老龄化和临床队列的更稳健的质量控制分类器。