Phuttharak Warinthorn, Boonrod Arunnit, Klungboonkrong Vivian, Witsawapaisan Thanatchaporn
Department of Radiology, Faculty of Medicine, Khon Kaen University, Thailand. Email:
Asian Pac J Cancer Prev. 2019 Apr 29;20(4):1283-1288. doi: 10.31557/APJCP.2019.20.4.1283.
Background: Thyroid ultrasound(US) is used as the first diagnostic tool to assess the management of disease but is operator dependent. There have been few reports evaluating interrater variability in US assessment. Therefore, we evaluated interrater reliability in US assessment of thyroid nodules and estimated its diagnostic accuracy for various TIRADS systems. Methods: This retrospective study included 24 malignant nodules and 84 benign nodules from January 2015 to October 2017. Two blinded observers independently reviewed stored US images by using TIRADS. All analyses followed guidelines proposed by ACR-TR, Siriraj-TR and EU-TR systems. Interrater reliability was calculated using Cohen’s Kappa statistics. Diagnostic accuracy were also calculated. Results: Interobserver agreement showed substantial agreement for composition (K=0.616); echogenicity and echogenic foci showed fair agreement (K=0.327 and 0.288, respectively); margin showed slight agreement (K=0.143). Interrater reliability for the final assessment; moderate agreement for ACR-TIRADS system (K=0.500); fair agreement for EU-TIRADS system (K=0.209) and slight agreement (K=0.114) for Siriraj-TIRADS system. The diagnostic performance from the two observers; ACRTIRADS system; sensitivities were 75% and 79.2%, specificities were 58.3% and 56%, positive predictive value (PPV) were 34% and 33.9% and negative predictive value (NPV) were 89.1% and 90.4%. For the Siriraj-TIRADS system, sensitivities were 41.7% and 25%, specificities were 84.5% and 89.3%, positive predictive value (PPV) were 43.5% and 40% and negative predictive value (NPV) were 83.5% and 80.6%. For the EU-TIRADS system, sensitivities were 45.8% and 66.7%, specificities were 79.8% and 72.6%, positive predictive value (PPV) were 39.3% and 41% and negative predictive value (NPV) were 83.8% and 88.4%. Conclusion: The ACR-TIRADS had highest interobserver agreement, a trend to have highest sensitivity and negative predictive value for diagnosis of malignant thyroid nodules. Siriraj-TIRADS had higher specificity and accuracy, but lower interobserver agreement.
甲状腺超声(US)是评估疾病管理的首要诊断工具,但依赖操作者。很少有报告评估超声评估中的观察者间变异性。因此,我们评估了甲状腺结节超声评估中的观察者间可靠性,并估计了其对各种甲状腺影像报告和数据系统(TIRADS)的诊断准确性。方法:这项回顾性研究纳入了2015年1月至2017年10月的24个恶性结节和84个良性结节。两名盲法观察者使用TIRADS独立回顾存储的超声图像。所有分析均遵循美国放射学会-甲状腺影像报告和数据系统(ACR-TR)、诗里拉吉医院-甲状腺影像报告和数据系统(Siriraj-TR)及欧洲-甲状腺影像报告和数据系统(EU-TR)提出的指南。使用科恩kappa统计量计算观察者间可靠性。还计算了诊断准确性。结果:观察者间一致性显示,在成分方面为高度一致(K = 0.616);在回声性和回声灶方面为中等一致(分别为K = 0.327和0.288);在边界方面为轻度一致(K = 0.143)。最终评估的观察者间可靠性:ACR-TIRADS系统为中度一致(K = 0.500);EU-TIRADS系统为中等一致(K = 0.209),Siriraj-TIRADS系统为轻度一致(K = 0.114)。两位观察者的诊断性能:ACR-TIRADS系统,敏感性分别为75%和79.2%,特异性分别为58.3%和56%,阳性预测值(PPV)分别为34%和33.9%,阴性预测值(NPV)分别为89.1%和90.4%。对于Siriraj-TIRADS系统,敏感性分别为41.7%和25%,特异性分别为84.5%和89.3%,阳性预测值(PPV)分别为43.5%和40%,阴性预测值(NPV)分别为83.5%和80.6%。对于EU-TIRADS系统,敏感性分别为45.8%和66.7%,特异性分别为79.8%和72.6%,阳性预测值(PPV)分别为39.3%和41%,阴性预测值(NPV)分别为83.8%和88.4%。结论:ACR-TIRADS在观察者间一致性方面最高,在诊断甲状腺恶性结节方面有敏感性和阴性预测值最高的趋势。Siriraj-TIRADS有较高的特异性和准确性,但观察者间一致性较低。