Suppr超能文献

探索用于眼球震颤分类的生成式预训练变换器-4视觉模型:瞳孔追踪过程的开发与验证

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.

作者信息

Noda Masao, Koshu Ryota, Tsunoda Reiko, Ogihara Hirofumi, Kamo Tomohiko, Ito Makoto, Fushiki Hiroaki

机构信息

Department of Otolaryngology, Mejiro University Ear Institute Clinic, 320 Ukiya, Iwatsuki-ku, Saitama-shi, Saitama, 339-8501, Japan, 81 48 797 3341.

Department of Otolaryngology, Jichi Medical University, Shimotsuke, Japan.

出版信息

JMIR Form Res. 2025 Jun 6;9:e70070. doi: 10.2196/70070.

Abstract

BACKGROUND

Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been used to automate nystagmus classification using convolutional and recurrent neural networks. These networks can accurately classify nystagmus patterns using video data. However, associated challenges including the need for large datasets when creating models, limited applicability to address specific image conditions, and the complexity associated with using these models.

OBJECTIVE

This study aimed to evaluate a novel approach for nystagmus classification that used the Generative Pre-trained Transformer 4 Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.

METHODS

We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model's accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.

RESULTS

The developed model showed an overall classification accuracy of 37% when using pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were used as input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.

CONCLUSIONS

We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding the dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model validated only for recognizing still images can be linked to video classification and proposed as a novel method.

摘要

背景

传统的眼球震颤分类方法通常依赖专家的主观观察,这既耗时,而且临床医生之间的判断存在差异。最近,深度学习技术已被用于通过卷积神经网络和循环神经网络实现眼球震颤分类的自动化。这些网络可以使用视频数据准确地对眼球震颤模式进行分类。然而,相关挑战包括创建模型时需要大型数据集、解决特定图像条件的适用性有限以及使用这些模型的复杂性。

目的

本研究旨在评估一种使用生成式预训练视觉变换器4(GPT-4V)模型进行眼球震颤分类的新方法,该模型是一种具有强大图像识别能力的先进大规模语言模型。

方法

我们使用眼球震颤记录视频开发了一种瞳孔跟踪程序,并使用GPT-4V分类和眼球震颤记录来验证优化模型的准确性。我们测试了创建的优化模型是否可以在六种眼球震颤类别中进行评估:右水平、左水平、向上、向下、右扭转和左扭转。跟踪的轨迹作为二维坐标数据或图像输入,并评估了多种上下文学习方法。

结果

当使用瞳孔跟踪图像时,开发的模型总体分类准确率为37%,当使用瞳孔坐标作为输入时,最高准确率为24.6%。在方向方面,我们对水平眼球震颤模式分类的最高准确率为69%,但对垂直和扭转成分的准确率较低。

结论

我们在生成式人工智能模型中展示了通用眩晕管理的潜力,该模型提高了眼球震颤分类的准确性和效率。我们还强调了需要进一步改进的领域,例如扩大数据集规模和增强输入模态,以提高所有类型眼球震颤的分类性能。仅经过识别静止图像验证的GPT-4V模型可与视频分类相关联,并作为一种新方法提出。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d82d/12164947/a3e85d993a8c/formative-v9-e70070-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验