Upadhyaya Dipak P, Prantzalos Katrina, Golnari Pedram, Shaikh Aasef G, Sivagnanam Subhashini, Majumdar Amitava, Ghasia Fatema F, Sahoo Satya S
Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
National VA Parkinson's Consortium Center, Louis Stokes Cleveland VA Medical Center, OH, USA.
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:566-575. eCollection 2025.
Amblyopia is a neurodevelopmental disorder affecting children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye tracking recordings from high fidelity eye tracking instruments performed by specialized pediatric ophthalmologists, often unavailable in rural, low resource clinics. As such, there is an urgent need to develop a scalable, low cost, high accuracy approach to automatically analyze eye tracking recordings. Large Language Models (LLM) show promise in accurate detection of amblyopia; our prior work has shown that the Google Gemini model, guided by expert ophthalmologists, can detect control and amblyopic subjects from eye tracking recordings. However, there is a clear need to address the issues of transparency and trust in medical applications of LLMs. To bolster the reliability and interpretability of LLM analysis of eye tracking records, we developed a Feature Guided Interprative Prompting (FGIP) framework focused on critical clinical features. Using the Google Gemini model, we classify high-fidelity eye-tracking data to detect amblyopia in children and apply the Quantus framework to evaluate the classification results across key metrics (faithfulness, robustness, localization, and complexity). These metrics provide a quantitative basis for understanding the model's decision-making process. This work presents the first implementation of an Explainable Artificial Intelligence (XAI) framework to systematically characterize the results generated by the Gemini model using high-fidelity eye-tracking data to detect amblyopia in children. Results demonstrated that the model accurately classified control and amblyopic subjects, including those with nystagmus while maintaining transparency and clinical alignment. The results of this study support the development of a scalable and interpretable clinical decision support (CDS) tool using LLMs that has the potential to enhance the trustworthiness of AI applications.
弱视是一种影响儿童视力的神经发育障碍,需要早期诊断以便进行有效治疗。传统的诊断方法依赖于由专业儿科眼科医生使用高保真眼动追踪仪器对眼动记录进行主观评估,而在农村、资源匮乏的诊所往往无法获得此类仪器。因此,迫切需要开发一种可扩展、低成本、高精度的方法来自动分析眼动记录。大语言模型(LLM)在弱视的准确检测方面显示出前景;我们之前的工作表明,在专家眼科医生的指导下,谷歌Gemini模型可以从眼动记录中检测出对照者和弱视患者。然而,在大语言模型的医学应用中,显然需要解决透明度和信任问题。为了增强大语言模型对眼动记录分析的可靠性和可解释性,我们开发了一个专注于关键临床特征的特征引导解释性提示(FGIP)框架。使用谷歌Gemini模型,我们对高保真眼动数据进行分类,以检测儿童弱视,并应用Quantus框架跨关键指标(忠实性、稳健性、定位性和复杂性)评估分类结果。这些指标为理解模型的决策过程提供了定量依据。这项工作首次实现了一个可解释人工智能(XAI)框架,该框架使用高保真眼动数据系统地表征Gemini模型生成的结果,以检测儿童弱视。结果表明,该模型能够准确地对对照者和弱视患者进行分类,包括那些患有眼球震颤的患者,同时保持透明度并与临床情况相符。这项研究的结果支持使用大语言模型开发一种可扩展且可解释的临床决策支持(CDS)工具,该工具有可能提高人工智能应用的可信度。