Keays M A, Guerra L A, Mihill J, Raju G, Al-Asheeri N, Geier P, Gaboury I, Matzinger M, Pike J, Leonard M P
Division of Pediatric Urology, University of Ottawa, Ontario, Canada.
J Urol. 2008 Oct;180(4 Suppl):1680-2; discussion1682-3. doi: 10.1016/j.juro.2008.03.107. Epub 2008 Aug 16.
The Society for Fetal Urology introduced a subjective grading system for classifying hydronephrosis that has important implications in patient diagnosis, treatment and outcome. The grading system is frequently used to standardize the severity of hydronephrosis, and compare results among patients and centers. Despite widespread use to our knowledge no groups have investigated the reliability of the grading system since its introduction. We assessed the intrarater and interrater reliability of the Society for Fetal Urology grading system for hydronephrosis and examined levels of agreement by the degree of hydronephrosis (grades 0 to 4) and level of experience (staff vs trainee).
A series of 50 pediatric renal ultrasound images from patients with a diagnosis of hydronephrosis were assessed by 4 staff individuals and 4 trainees using the Society for Fetal Urology grading system. Ultrasound images included the kidneys, ureters and bladder to be consistent with practice. After 7 to 14 days each rater repeated the assessment. The nonweighted Cohen kappa statistic was used to estimate intrarater and interrater reliability by Society for Fetal Urology grade and training level.
Staff and trainee raters independently assigned Society for Fetal Urology grades to 50 patients (99 renal units). The average number of images per ultrasound was 41, including the right and left kidneys. Overall interrater agreement for staff individuals was substantial for grade 0, moderate for grades 1, 2 and 4, and only slight to fair for grade 3. Intrarater agreement was substantial to almost perfect for staff agreement (range 69% to 94%, kappa 0.56 to 0.89) and trainees (range 63% to 90%, kappa 0.48 to 0.85).
Our study suggests that the Society for Fetal Urology grading system has good intrarater but modest interrater reliability. Individual rater interpretations of the grading system may explain the modest interrater agreement. Proposed modifications to the Society for Fetal Urology classification system, such as distinguishing between diffuse and segmental cortical thinning, may improve reliability.
胎儿泌尿外科学会引入了一种用于对肾积水进行分类的主观分级系统,该系统在患者诊断、治疗及预后方面具有重要意义。该分级系统常被用于规范肾积水的严重程度,并在患者及各中心之间比较结果。尽管其应用广泛,但据我们所知,自该分级系统引入以来,尚无团队对其可靠性进行过研究。我们评估了胎儿泌尿外科学会肾积水分级系统的评分者内及评分者间可靠性,并按肾积水程度(0至4级)及经验水平( staff vs trainee)检查了一致性水平。
4名 staff人员和4名实习生使用胎儿泌尿外科学会分级系统对一系列50例诊断为肾积水的儿科肾脏超声图像进行评估。超声图像包括肾脏、输尿管和膀胱,以符合实际操作。7至14天后,每位评分者重复评估。采用非加权Cohen kappa统计量,按胎儿泌尿外科学会分级和培训水平估计评分者内及评分者间可靠性。
staff人员和实习生评分者分别为50例患者(99个肾单位)分配胎儿泌尿外科学会分级。每次超声检查的平均图像数量为41张,包括左右肾脏。 staff人员总体评分者间一致性在0级为高度一致,1、2和4级为中度一致,3级仅为轻度至中度一致。评分者内一致性在 staff人员中为高度一致至几乎完全一致(范围69%至94%,kappa 0.56至0.89),实习生中为高度一致至几乎完全一致(范围63%至90%,kappa 0.48至0.85)。
我们的研究表明,胎儿泌尿外科学会分级系统具有良好的评分者内可靠性,但评分者间可靠性一般。评分者对分级系统的个体解读可能解释了评分者间一致性一般的原因。对胎儿泌尿外科学会分类系统提出的修改建议,如区分弥漫性和节段性皮质变薄,可能会提高可靠性。