Zarfati Mor, Nadkarni Girish N, Glicksberg Benjamin S, Harats Moti, Greenberger Shoshana, Klang Eyal, Soffer Shelly
Department of Internal Medicine, Soroka University Medical Center, Beer-Sheva 84101, Israel.
Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
J Clin Med. 2024 Dec 9;13(23):7480. doi: 10.3390/jcm13237480.
: This systematic review evaluates the current applications, advantages, and challenges of large language models (LLMs) in melanoma care. A systematic search was conducted in PubMed and Scopus databases for studies published up to 23 July 2024, focusing on the application of LLMs in melanoma. The review adhered to PRISMA guidelines, and the risk of bias was assessed using the modified QUADAS-2 tool. Nine studies were included, categorized into subgroups: patient education, diagnosis, and clinical management. In patient education, LLMs demonstrated high accuracy, though readability often exceeded recommended levels. For diagnosis, multimodal LLMs like GPT-4V showed capabilities in distinguishing melanoma from benign lesions, but accuracy varied, influenced by factors such as image quality and integration of clinical context. Regarding management advice, ChatGPT provided more reliable recommendations compared to other LLMs, but all models lacked depth for individualized decision-making. LLMs, particularly multimodal models, show potential in improving melanoma care. However, current applications require further refinement and validation. Future studies should explore fine-tuning these models on large, diverse dermatological databases and incorporate expert knowledge to address limitations such as generalizability across different populations and skin types.
本系统评价评估了大语言模型(LLMs)在黑色素瘤治疗中的当前应用、优势和挑战。在PubMed和Scopus数据库中对截至2024年7月23日发表的研究进行了系统检索,重点关注大语言模型在黑色素瘤中的应用。该评价遵循PRISMA指南,并使用改良的QUADAS - 2工具评估偏倚风险。纳入了九项研究,分为以下亚组:患者教育、诊断和临床管理。在患者教育方面,大语言模型显示出较高的准确性,不过可读性往往超过推荐水平。对于诊断,像GPT - 4V这样的多模态大语言模型在区分黑色素瘤与良性病变方面表现出一定能力,但准确性因图像质量和临床背景整合等因素而异。关于治疗建议,与其他大语言模型相比,ChatGPT提供了更可靠的建议,但所有模型在个性化决策方面都缺乏深度。大语言模型,尤其是多模态模型,在改善黑色素瘤治疗方面显示出潜力。然而,当前的应用需要进一步完善和验证。未来的研究应探索在大型、多样的皮肤病学数据库上对这些模型进行微调,并纳入专家知识,以解决不同人群和皮肤类型的普遍性等局限性问题。