Hong Dao-Rong, Huang Chun-Yan, Zhong Huo-Hu, Lyu Guo-Rong
Department of Ultrasonography, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian, China.
Department of General Practice, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian, China.
Front Med (Lausanne). 2025 Jul 30;12:1634976. doi: 10.3389/fmed.2025.1634976. eCollection 2025.
OBJECTIVE: This study aims to evaluate the application of ChatGPT-4 Vision in the ultrasonic image analysis of thyroid nodules by comparing its diagnostic efficacy and consistency with those of sonographers. METHODS: In this prospective study, conducted in real clinical scenarios, we included 124 patients with pathologically confirmed thyroid nodules who underwent ultrasound examinations at Fujian Medical University Affiliated Second Hospital. A physician, not involved in the study, collected three ultrasound images for each nodule: the maximum cross-sectional, maximum longitudinal, and the section best representing the nodular characteristics. The images were analyzed by the primed ChatGPT-4 Vision and classified according to the 2020 Chinese Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules (C-TIRADS). Two sonographers with different qualifications (a resident physician and an attending physician) used the same images to classify the nodules according to the C-TIRADS guidelines. Using fine needle aspiration (FNA) biopsy or surgical pathology results as the gold standard, we compared the consistency and diagnostic efficacy of the primed ChatGPT-4 Vision with those of the sonographers. RESULTS: (1) ChatGPT-4 Vision diagnosed thyroid nodules with a sensitivity of 86.2%, specificity of 60.0%, and an AUC of 0.731, which was comparable to the resident's sensitivity of 85.1% (95% CI: 77.2-90.8%), specificity of 66.7% (95% CI: 53.7-77.7%), and AUC of 0.759 ( > 0.05), but lower than the attending physician's sensitivity of 97.9% (95% CI: 93.2-99.5%), specificity of 80.0% (95% CI: 67.7-88.6%), and AUC of 0.889 (95% CI: 81.5-96.4%) ( < 0.05). (2) The primed ChatGPT-4 Vision demonstrated good consistency with the resident in thyroid nodule classification (Kappa value = 0.729), though its consistency with the pathological diagnosis was lower than that of the attending physician (Kappa values of 0.457 vs. 0.816, respectively). CONCLUSION: The primed ChatGPT-4 Vision demonstrates promising clinical utility in thyroid nodule risk stratification, achieving diagnostic performance comparable to resident physicians. Its ability to standardize image analysis aligns with precision medicine goals, offering a foundation for future integration with dynamic ultrasound modalities to enhance pathological correlation.
目的:本研究旨在通过比较ChatGPT-4 Vision与超声检查医师在甲状腺结节超声图像分析中的诊断效能和一致性,评估其在甲状腺结节超声图像分析中的应用。 方法:在这项在真实临床场景中进行的前瞻性研究中,我们纳入了124例经病理证实的甲状腺结节患者,这些患者在福建医科大学附属第二医院接受了超声检查。一名未参与该研究的医生为每个结节收集了三张超声图像:最大横截面图像、最大纵截面图像以及最能代表结节特征的截面图像。由经过训练的ChatGPT-4 Vision对图像进行分析,并根据2020年《中国甲状腺结节超声恶性风险分层指南》(C-TIRADS)进行分类。两名资质不同的超声检查医师(一名住院医师和一名主治医师)使用相同的图像,根据C-TIRADS指南对结节进行分类。以细针穿刺(FNA)活检或手术病理结果作为金标准,我们比较了经过训练的ChatGPT-4 Vision与超声检查医师的一致性和诊断效能。 结果:(1)ChatGPT-4 Vision诊断甲状腺结节的灵敏度为86.2%,特异度为60.0%,曲线下面积(AUC)为0.731,与住院医师的灵敏度85.1%(95%置信区间:77.2-90.8%)、特异度66.7%(95%置信区间:53.7-77.7%)和AUC 0.759(P>0.05)相当,但低于主治医师灵敏度97.9%(95%置信区间:93.2-99.5%)、特异度80.0%(95%置信区间:67.7-88.6%)和AUC 0.889(95%置信区间:81.5-96.4%)(P<0.05)。(2)经过训练的ChatGPT-4 Vision在甲状腺结节分类方面与住院医师表现出良好的一致性(Kappa值=0.729),尽管其与病理诊断的一致性低于主治医师(Kappa值分别为0.457和0.816)。 结论:经过训练的ChatGPT-4 Vision在甲状腺结节风险分层中显示出有前景的临床应用价值,其诊断性能与住院医师相当。它标准化图像分析的能力符合精准医学目标,为未来与动态超声模式整合以增强病理相关性奠定了基础。
Front Med (Lausanne). 2025-7-30
Nat Commun. 2024-5-11
Nature. 2024-5
Med Educ Online. 2024-12-31
Comput Struct Biotechnol J. 2023-11-4