Department of Internal Medicine, Cleveland Clinic, Cleveland, OH 44195, USA.
Department of Infectious Diseases, Cleveland Clinic, Cleveland, OH 44195, USA.
Future Microbiol. 2024;19(15):1283-1292. doi: 10.1080/17460913.2024.2381967. Epub 2024 Jul 29.
Assessing the visual accuracy of two large language models (LLMs) in microbial classification. GPT-4o and Gemini 1.5 Pro were evaluated in distinguishing Gram-positive from Gram-negative bacteria and classifying them as cocci or bacilli using 80 Gram stain images from a labeled database. GPT-4o achieved 100% accuracy in identifying simultaneously Gram stain and shape for , and . Gemini 1.5 Pro showed more variability for similar bacteria (45, 100 and 95%, respectively). Both LLMs failed to identify both Gram stain and bacterial shape for . Cumulative accuracy plots indicated that GPT-4o consistently performed equally or better in every identification, except for shape. These results suggest that these LLMs in their unprimed state are not ready to be implemented in clinical practice and highlight the need for more research with larger datasets to improve LLMs' effectiveness in clinical microbiology.
评估两种大型语言模型(LLMs)在微生物分类中的视觉准确性。使用来自标记数据库的 80 张革兰氏染色图像,评估 GPT-4o 和 Gemini 1.5 Pro 在区分革兰氏阳性菌和革兰氏阴性菌以及将其分类为球菌或杆菌方面的准确性。GPT-4o 在识别同时的革兰氏染色和形状方面达到了 100%的准确率,对于 、 和 。Gemini 1.5 Pro 对相似的细菌显示出更多的可变性(分别为 45%、100%和 95%)。这两种 LLM 都无法识别 和 的革兰氏染色和细菌形状。累积准确率图表明,GPT-4o 在每次识别中表现一致或更好,除了 形状。这些结果表明,这些未经过预训练的 LLM 还没有准备好在临床实践中实施,并强调需要使用更大的数据集进行更多研究,以提高 LLM 在临床微生物学中的有效性。