探索用于眼球震颤分类的生成式预训练变换器-4视觉模型：瞳孔追踪过程的开发与验证

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.

作者信息

Noda Masao, Koshu Ryota, Tsunoda Reiko, Ogihara Hirofumi, Kamo Tomohiko, Ito Makoto, Fushiki Hiroaki

机构信息

Department of Otolaryngology, Mejiro University Ear Institute Clinic, 320 Ukiya, Iwatsuki-ku, Saitama-shi, Saitama, 339-8501, Japan, 81 48 797 3341.

Department of Otolaryngology, Jichi Medical University, Shimotsuke, Japan.

出版信息

JMIR Form Res. 2025 Jun 6;9:e70070. doi: 10.2196/70070.

DOI:10.2196/70070

PMID:40478723

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12164947/

Abstract

BACKGROUND

Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been used to automate nystagmus classification using convolutional and recurrent neural networks. These networks can accurately classify nystagmus patterns using video data. However, associated challenges including the need for large datasets when creating models, limited applicability to address specific image conditions, and the complexity associated with using these models.

OBJECTIVE

This study aimed to evaluate a novel approach for nystagmus classification that used the Generative Pre-trained Transformer 4 Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.

METHODS

We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model's accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.

RESULTS

The developed model showed an overall classification accuracy of 37% when using pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were used as input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.

CONCLUSIONS

We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding the dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model validated only for recognizing still images can be linked to video classification and proposed as a novel method.

摘要

背景

传统的眼球震颤分类方法通常依赖专家的主观观察，这既耗时，而且临床医生之间的判断存在差异。最近，深度学习技术已被用于通过卷积神经网络和循环神经网络实现眼球震颤分类的自动化。这些网络可以使用视频数据准确地对眼球震颤模式进行分类。然而，相关挑战包括创建模型时需要大型数据集、解决特定图像条件的适用性有限以及使用这些模型的复杂性。

目的

本研究旨在评估一种使用生成式预训练视觉变换器4（GPT-4V）模型进行眼球震颤分类的新方法，该模型是一种具有强大图像识别能力的先进大规模语言模型。

方法

我们使用眼球震颤记录视频开发了一种瞳孔跟踪程序，并使用GPT-4V分类和眼球震颤记录来验证优化模型的准确性。我们测试了创建的优化模型是否可以在六种眼球震颤类别中进行评估：右水平、左水平、向上、向下、右扭转和左扭转。跟踪的轨迹作为二维坐标数据或图像输入，并评估了多种上下文学习方法。

结果

当使用瞳孔跟踪图像时，开发的模型总体分类准确率为37%，当使用瞳孔坐标作为输入时，最高准确率为24.6%。在方向方面，我们对水平眼球震颤模式分类的最高准确率为69%，但对垂直和扭转成分的准确率较低。

结论

我们在生成式人工智能模型中展示了通用眩晕管理的潜力，该模型提高了眼球震颤分类的准确性和效率。我们还强调了需要进一步改进的领域，例如扩大数据集规模和增强输入模态，以提高所有类型眼球震颤的分类性能。仅经过识别静止图像验证的GPT-4V模型可与视频分类相关联，并作为一种新方法提出。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d82d/12164947/a3e85d993a8c/formative-v9-e70070-g001.jpg

相似文献

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.

JMIR Form Res. 2025 Jun 6;9:e70070. doi: 10.2196/70070.

Prescription of Controlled Substances: Benefits and Risks

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Artificial intelligence for diagnosing exudative age-related macular degeneration.

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.

J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.

Head impulse, nystagmus, and test of skew examination for diagnosing central causes of acute vestibular syndrome.

Cochrane Database Syst Rev. 2023 Nov 2;11(11):CD015089. doi: 10.1002/14651858.CD015089.pub2.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

本文引用的文献

Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries.

medRxiv. 2024 Apr 4:2024.04.03.24305088. doi: 10.1101/2024.04.03.24305088.

Adequacy of prostate cancer prevention and screening recommendations provided by an artificial intelligence-powered large language model.

Int Urol Nephrol. 2024 Aug;56(8):2589-2595. doi: 10.1007/s11255-024-04009-5. Epub 2024 Apr 2.

Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.

JMIR Med Educ. 2024 Mar 28;10:e57054. doi: 10.2196/57054.

A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.

Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.

Performance of Generative Pretrained Transformer on the National Medical Licensing Examination in Japan.

PLOS Digit Health. 2024 Jan 23;3(1):e0000433. doi: 10.1371/journal.pdig.0000433. eCollection 2024 Jan.

A nystagmus extraction system using artificial intelligence for video-nystagmography.

Sci Rep. 2023 Jul 24;13(1):11975. doi: 10.1038/s41598-023-39104-7.

Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.

J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.

A Medical Ethics Framework for Conversational Artificial Intelligence.

J Med Internet Res. 2023 Jul 26;25:e43068. doi: 10.2196/43068.

aEYE: A deep learning system for video nystagmus detection.

Front Neurol. 2022 Aug 11;13:963968. doi: 10.3389/fneur.2022.963968. eCollection 2022.

Nystagmus.

JAMA. 2021 Feb 23;325(8):798. doi: 10.1001/jama.2020.3911.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

探索用于眼球震颤分类的生成式预训练变换器-4视觉模型：瞳孔追踪过程的开发与验证

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献