Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland.
Division of Radiology, Istituto Dermopatico dell'Immacolata (IDI) IRCCS, Rome, Italy.
Eur Radiol. 2024 Apr;34(4):2791-2804. doi: 10.1007/s00330-023-10217-x. Epub 2023 Sep 21.
To investigate the intra- and inter-rater reliability of the total radiomics quality score (RQS) and the reproducibility of individual RQS items' score in a large multireader study.
Nine raters with different backgrounds were randomly assigned to three groups based on their proficiency with RQS utilization: Groups 1 and 2 represented the inter-rater reliability groups with or without prior training in RQS, respectively; group 3 represented the intra-rater reliability group. Thirty-three original research papers on radiomics were evaluated by raters of groups 1 and 2. Of the 33 papers, 17 were evaluated twice with an interval of 1 month by raters of group 3. Intraclass coefficient (ICC) for continuous variables, and Fleiss' and Cohen's kappa (k) statistics for categorical variables were used.
The inter-rater reliability was poor to moderate for total RQS (ICC 0.30-055, p < 0.001) and very low to good for item's reproducibility (k - 0.12 to 0.75) within groups 1 and 2 for both inexperienced and experienced raters. The intra-rater reliability for total RQS was moderate for the less experienced rater (ICC 0.522, p = 0.009), whereas experienced raters showed excellent intra-rater reliability (ICC 0.91-0.99, p < 0.001) between the first and second read. Intra-rater reliability on RQS items' score reproducibility was higher and most of the items had moderate to good intra-rater reliability (k - 0.40 to 1).
Reproducibility of the total RQS and the score of individual RQS items is low. There is a need for a robust and reproducible assessment method to assess the quality of radiomics research.
There is a need for reproducible scoring systems to improve quality of radiomics research and consecutively close the translational gap between research and clinical implementation.
• Radiomics quality score has been widely used for the evaluation of radiomics studies. • Although the intra-rater reliability was moderate to excellent, intra- and inter-rater reliability of total score and point-by-point scores were low with radiomics quality score. • A robust, easy-to-use scoring system is needed for the evaluation of radiomics research.
在一项大型多读者研究中,研究总放射组学质量评分(RQS)的组内和组间可靠性以及个别 RQS 项目评分的可重复性。
9 名具有不同背景的读者根据其对 RQS 使用的熟练程度随机分为三组:组 1 和组 2 分别代表具有或不具有 RQS 预先培训的组间可靠性组;组 3 代表组内可靠性组。组 1 和组 2 的读者评估了 33 篇原始放射组学研究论文。在组 3 的读者中,其中 17 篇论文间隔 1 个月进行了两次评估。对于连续变量,使用组内相关系数(ICC),对于分类变量,使用 Fleiss 和 Cohen 的 kappa(k)统计量。
对于总 RQS,组内和组间可靠性均较差至中等(ICC 0.30-0.55,p<0.001),对于经验不足和有经验的读者,项目的可重复性均较低至良好(k-0.12 至 0.75)。对于经验较少的读者,总 RQS 的组内可靠性为中等(ICC 0.522,p=0.009),而经验丰富的读者在第一次和第二次阅读之间显示出极好的组内可靠性(ICC 0.91-0.99,p<0.001)。RQS 项目评分可重复性的组内可靠性较高,大多数项目具有中等至良好的组内可靠性(k-0.40 至 1)。
总 RQS 和个别 RQS 项目评分的可重复性较低。需要一种稳健且可重复的评估方法来评估放射组学研究的质量。
需要可重复的评分系统来提高放射组学研究的质量,并最终缩小研究与临床实施之间的转化差距。
放射组学质量评分已广泛用于评估放射组学研究。
尽管组内可靠性为中等至优秀,但放射组学质量评分的总分和逐项评分的组内和组间可靠性较低。
需要一种稳健、易用的评分系统来评估放射组学研究。