Si Cai-Feng, Yu Jing, Cui Yi-Yang, Huang Yuan-Jing, Cui Ke-Fei, Fu Chao
Department of Ultrasound, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.
Quant Imaging Med Surg. 2024 Dec 5;14(12):9234-9245. doi: 10.21037/qims-24-282. Epub 2024 Nov 6.
The lack of standardization in risk stratification systems (RSSs) has led to uncertainty in selecting the most effective RSS for diagnosing malignancy risk in thyroid nodules. Therefore, the aim of this study was to compare the diagnostic performance of four current score-based RSSs according to thyroid nodule size, with the goal of determining the most effective RSS and aiding in clinical decision-making.
Between July 2013 and January 2019, a total of 2,667 consecutive patients presenting with 3,944 thyroid nodules were pathologically diagnosed after thyroidectomy and/or ultrasound (US)-guided fine-needle aspiration (FNA). These nodules were retrospectively dichotomized into two groups: small nodules (<1 cm) and large nodules (≥1 cm). The four RSSs were used to assign US categories, and the diagnostic performances were computed and compared based on the size of thyroid nodules, both before and after the application of size thresholds for biopsy.
After thyroidectomy or biopsy, 1,781 (45.2%) thyroid nodules were found to be malignant. (I) After applying size thresholds for biopsy in ≥1 cm nodules, the highest specificity, accuracy, area under the curve (AUC) and the lowest FNA rate and unnecessary FNA rate were observed in the Artificial Intelligence-Thyroid Imaging Reporting And Data System (AI-TIRADS) (66.1%, 75.3%, 0.785, 55.1%, and 38.6%, respectively, P<0.05 for all). (II) Before applying size thresholds for biopsy in ≥1 cm nodules, the FNA rate and unnecessary FNA rate of the four RSSs were lower they were after the application of the size threshold: American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS), 59.1% versus 61.4%, 39.8% versus 45.4%; AI-TIRADS, 52.3% versus 55.1%, 34.0% versus 38.6%; TIRADS issued by Kwak (Kwak-TIRADS), 52.5% versus 76.1%, 34.4% versus 52.1%; Chinese Thyroid Imaging Reporting and Data System (C-TIRADS), 51.5% versus 66.2%, 34.4% versus 50.1% (P<0.05 for all). (III) The small nodules showed higher sensitivity and lower specificity than the large nodules (ACR-TIRADS, 97.7% versus 95.5%, 46.2% versus 62.5%; AI-TIRADS, 97.2% versus 92.7%, 49.9% versus 71.6%; Kwak-TIRADS, 97.2% versus 92.5%, 49.7% versus 71.3%; C-TIRADS, 94.2% versus 90.7%, 55.0% versus 71.8%, respectively, all P<0.05).
A potential effective strategy for managing large nodules in the current score-based RSSs could be to rely solely on US categories rather than size thresholds for biopsy. Additionally, the diagnostic performance of small nodules showed higher sensitivity and lower specificity compared to large nodules before applying size thresholds for biopsy. These findings suggest a possible new management strategy for large nodules and provide a basis for the managing small nodules.
风险分层系统(RSSs)缺乏标准化,导致在选择最有效的RSS来诊断甲状腺结节的恶性风险时存在不确定性。因此,本研究的目的是根据甲状腺结节大小比较四种当前基于评分的RSSs的诊断性能,以确定最有效的RSS并辅助临床决策。
在2013年7月至2019年1月期间,共有2667例连续患者出现3944个甲状腺结节,在甲状腺切除和/或超声(US)引导下细针穿刺活检(FNA)后进行了病理诊断。这些结节被回顾性地分为两组:小结节(<1 cm)和大结节(≥1 cm)。使用四种RSSs来分配US类别,并根据甲状腺结节大小计算和比较诊断性能,包括应用活检大小阈值之前和之后。
甲状腺切除或活检后,发现1781个(45.2%)甲状腺结节为恶性。(I)在≥1 cm结节中应用活检大小阈值后,人工智能甲状腺影像报告和数据系统(AI-TIRADS)的特异性、准确性、曲线下面积(AUC)最高,FNA率和不必要FNA率最低(分别为66.1%、75.3%、0.785、55.1%和38.6%,所有P<0.05)。(II)在≥1 cm结节中应用活检大小阈值之前,四种RSSs的FNA率和不必要FNA率低于应用大小阈值之后:美国放射学会甲状腺影像报告和数据系统(ACR-TIRADS),59.1%对61.4%,39.8%对45.4%;AI-TIRADS,52.3%对55.1%,34.0%对38.6%;Kwak发布的TIRADS(Kwak-TIRADS),52.5%对76.1%,34.4%对52.1%;中国甲状腺影像报告和数据系统(C-TIRADS),51.5%对66.2%,34.4%对50.1%(所有P<0.05)。(III)小结节的敏感性高于大结节,特异性低于大结节(ACR-TIRADS,97.7%对95.5%,46.2%对62.5%;AI-TIRADS,97.2%对92.7%,49.9%对71.6%;Kwak-TIRADS,97.2%对92.5%,49.7%对71.3%;C-TIRADS,94.2%对90.7%,55.0%对71.8%,所有P<0.05)。
在当前基于评分的RSSs中,管理大结节的一种潜在有效策略可能是仅依靠US类别而不是活检大小阈值。此外,在应用活检大小阈值之前,小结节的诊断性能显示出比大结节更高的敏感性和更低的特异性。这些发现为大结节提出了一种可能的新管理策略,并为管理小结节提供了依据。