Noda Masao, Yoshimura Hidekane, Okubo Takuya, Koshu Ryota, Uchiyama Yuki, Nomura Akihiro, Ito Makoto, Takumi Yutaka
Department of Otolaryngology, Head and Neck Surgery, Jichi Medical University, Shimotsuke, Japan.
Department of Otolaryngology - Head and Neck Surgery, Shinshu University, Matsumoto, Japan.
JMIR AI. 2024 May 31;3:e58342. doi: 10.2196/58342.
The integration of artificial intelligence (AI), particularly deep learning models, has transformed the landscape of medical technology, especially in the field of diagnosis using imaging and physiological data. In otolaryngology, AI has shown promise in image classification for middle ear diseases. However, existing models often lack patient-specific data and clinical context, limiting their universal applicability. The emergence of GPT-4 Vision (GPT-4V) has enabled a multimodal diagnostic approach, integrating language processing with image analysis.
In this study, we investigated the effectiveness of GPT-4V in diagnosing middle ear diseases by integrating patient-specific data with otoscopic images of the tympanic membrane.
The design of this study was divided into two phases: (1) establishing a model with appropriate prompts and (2) validating the ability of the optimal prompt model to classify images. In total, 305 otoscopic images of 4 middle ear diseases (acute otitis media, middle ear cholesteatoma, chronic otitis media, and otitis media with effusion) were obtained from patients who visited Shinshu University or Jichi Medical University between April 2010 and December 2023. The optimized GPT-4V settings were established using prompts and patients' data, and the model created with the optimal prompt was used to verify the diagnostic accuracy of GPT-4V on 190 images. To compare the diagnostic accuracy of GPT-4V with that of physicians, 30 clinicians completed a web-based questionnaire consisting of 190 images.
The multimodal AI approach achieved an accuracy of 82.1%, which is superior to that of certified pediatricians at 70.6%, but trailing behind that of otolaryngologists at more than 95%. The model's disease-specific accuracy rates were 89.2% for acute otitis media, 76.5% for chronic otitis media, 79.3% for middle ear cholesteatoma, and 85.7% for otitis media with effusion, which highlights the need for disease-specific optimization. Comparisons with physicians revealed promising results, suggesting the potential of GPT-4V to augment clinical decision-making.
Despite its advantages, challenges such as data privacy and ethical considerations must be addressed. Overall, this study underscores the potential of multimodal AI for enhancing diagnostic accuracy and improving patient care in otolaryngology. Further research is warranted to optimize and validate this approach in diverse clinical settings.
人工智能(AI)的整合,尤其是深度学习模型,已经改变了医学技术的格局,特别是在利用影像和生理数据进行诊断的领域。在耳鼻喉科,AI在中耳疾病的图像分类方面已显示出前景。然而,现有模型往往缺乏患者特定数据和临床背景,限制了它们的普遍适用性。GPT-4视觉(GPT-4V)的出现实现了一种多模态诊断方法,将语言处理与图像分析相结合。
在本研究中,我们通过将患者特定数据与鼓膜耳镜图像相结合,研究了GPT-4V在诊断中耳疾病中的有效性。
本研究的设计分为两个阶段:(1)使用适当的提示建立模型,(2)验证最佳提示模型对图像进行分类的能力。从2010年4月至2023年12月期间访问信州大学或自治医科大学的患者中,总共获得了4种中耳疾病(急性中耳炎、中耳胆脂瘤、慢性中耳炎和分泌性中耳炎)的305张耳镜图像。使用提示和患者数据建立了优化的GPT-4V设置,并使用以最佳提示创建的模型在190张图像上验证GPT-4V的诊断准确性。为了将GPT-4V的诊断准确性与医生的进行比较,30名临床医生完成了一份由190张图像组成的基于网络的问卷。
多模态AI方法的准确率达到82.1%,优于认证儿科医生的70.6%,但落后于耳鼻喉科医生超过95%的准确率。该模型针对特定疾病的准确率分别为:急性中耳炎89.2%,慢性中耳炎76.5%,中耳胆脂瘤79.3%,分泌性中耳炎85.7%,这突出了针对特定疾病进行优化的必要性。与医生的比较显示出有前景的结果,表明GPT-4V在增强临床决策方面的潜力。
尽管有其优势,但数据隐私和伦理考量等挑战必须得到解决。总体而言,本研究强调了多模态AI在提高耳鼻喉科诊断准确性和改善患者护理方面的潜力。有必要进行进一步研究,以在不同临床环境中优化和验证这种方法。