Chen Ziman, Chambara Nonhlanhla, Liu Shirley Yuk Wah, Chow Tom Chi Man, Lai Carol Man Sze, Ying Michael Tin Cheung
Department of Health Technology and Informatics, The Hong Kong Polytechnic University, 11 Yuk Choi Rd., Hung Hom, Kowloon, Hong Kong.
School of Healthcare Sciences, Cardiff University, Cardiff CF14 4XN, UK.
Cancers (Basel). 2025 Jun 20;17(13):2068. doi: 10.3390/cancers17132068.
Recent advancements in large language models, such as ChatGPT-4o, have created new opportunities for analyzing complex multi-modal data, including medical images. This study aims to assess the potential of ChatGPT-4o in distinguishing between benign and malignant thyroid nodules via multi-modality ultrasound imaging: grayscale ultrasound, color Doppler ultrasound (CDUS), and shear wave elastography (SWE). Patients who underwent thyroid nodule ultrasound examinations and had confirmed pathological diagnoses were included. ChatGPT-4o analyzed the multi-modality ultrasound data using two approaches: (1.) a dual-modality strategy which employed grayscale ultrasound and CDUS, and (2.) a triple-modality strategy which incorporated grayscale ultrasound, CDUS, and SWE. The diagnostic performance was compared against pathological findings utilizing receiver operating characteristic (ROC) curve analysis, while consistency was evaluated through analysis. A total of 106 thyroid nodules were evaluated; 65.1% were benign and 34.9% malignant. In the dual-modality approach, ChatGPT-4o achieved an area under the ROC curve (AUC) of 66.3%, moderate agreement with pathology results ( = 0.298), a sensitivity of 70.3%, a specificity of 62.3%, and an accuracy of 65.1%. Conversely, the triple-modality approach exhibited higher specificity at 97.1% but lower sensitivity at 18.9%, with an accuracy of 69.8% and a reduced overall agreement ( = 0.194), resulting in an AUC of 58.0%. ChatGPT-4o exhibits potential, to some extent, in classifying thyroid nodules using multi-modality ultrasound imaging. However, the dual-modality approach unexpectedly outperforms the triple-modality approach. This indicates that ChatGPT-4o might encounter challenges in integrating and prioritizing different data modalities, particularly when conflicting information is present, which could impact diagnostic effectiveness.
诸如ChatGPT-4o等大语言模型的最新进展为分析包括医学图像在内的复杂多模态数据创造了新机会。本研究旨在评估ChatGPT-4o通过多模态超声成像(灰度超声、彩色多普勒超声(CDUS)和剪切波弹性成像(SWE))区分良性和恶性甲状腺结节的潜力。纳入了接受甲状腺结节超声检查并已确诊病理诊断的患者。ChatGPT-4o使用两种方法分析多模态超声数据:(1)采用灰度超声和CDUS的双模态策略,以及(2)纳入灰度超声、CDUS和SWE的三模态策略。利用受试者操作特征(ROC)曲线分析将诊断性能与病理结果进行比较,同时通过分析评估一致性。共评估了106个甲状腺结节;65.1%为良性,34.9%为恶性。在双模态方法中,ChatGPT-4o的ROC曲线下面积(AUC)为66.3%,与病理结果的一致性中等(κ = 0.298),敏感性为70.3%,特异性为62.3%,准确性为65.1%。相反,三模态方法的特异性较高,为97.1%,但敏感性较低,为18.9%,准确性为69.8%,总体一致性降低(κ = 0.194),AUC为58.0%。ChatGPT-4o在使用多模态超声成像对甲状腺结节进行分类方面在一定程度上展现出潜力。然而,双模态方法意外地优于三模态方法。这表明ChatGPT-4o在整合不同数据模态并确定其优先级时可能会遇到挑战,尤其是当存在冲突信息时,这可能会影响诊断效果。