基于深度学习的斜视照片眼部区域精准裁剪：用于工作流程优化的算法开发与验证研究

Deep Learning-Based Precision Cropping of Eye Regions in Strabismus Photographs: Algorithm Development and Validation Study for Workflow Optimization.

作者信息

Wu Dawen, Li Yanfei, Yang Zeyi, Yin Teng, Chen Xiaohang, Liu Jingyu, Shang Wenyi, Xie Bin, Yang Guoyuan, Zhang Haixian, Liu Longqian

机构信息

Department of Ophthalmology, West China Hospital, Sichuan University, 37 Guoxue Xiang (Alley), Chengdu, Sichuan Province, 610041, China, 86 18980601759.

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, 610041, China.

出版信息

J Med Internet Res. 2025 Jul 17;27:e74402. doi: 10.2196/74402.

DOI:10.2196/74402

PMID:40674714

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12293638/

Abstract

BACKGROUND

Traditional ocular gaze photograph preprocessing, relying on manual cropping and head tilt correction, is time-consuming and inconsistent, limiting artificial intelligence (AI) model development and clinical application.

OBJECTIVE

This study aimed to address these challenges using an advanced preprocessing algorithm to enhance the accuracy, efficiency, and standardization of eye region cropping for clinical workflows and AI data preprocessing.

METHODS

This retrospective and prospective cross-sectional study utilized 5832 images from 648 inpatients and outpatients, capturing 3 gaze positions under diverse conditions, including obstructions and varying distances. The preprocessing algorithm, based on a rotating bounding box detection framework, was trained and evaluated using precision, recall, and mean average precision (mAP) at various intersections over union thresholds. A 5-fold cross-validation was performed on an inpatient dataset, with additional testing on an independent outpatient dataset and an external cross-population dataset of 500 images from the IMDB-WIKI collection, representing diverse ethnicities and ages. Expert validation confirmed alignment with clinical standards across 96 images (48 images from a Chinese dataset of patients with strabismus and 48 images from IMDB-WIKI). Gradient-weighted class activation mapping heatmaps were used to assess model interpretability. A control experiment with 5 optometry specialists compared manual and automated cropping efficiency. Downstream task validation involved preprocessing 1000 primary gaze photographs using the Dlib toolkit, faster region-based convolutional neural network (R-CNN; both without head tilt correction), and our model (with correction), evaluating the impact of head tilt correction via the vision transformer strabismus screening network through 5-fold cross-validation.

RESULTS

The model achieved exceptional performance across datasets: on the 5-fold cross-validation set, it recorded a mean precision of 1.000 (95% CI 1.000-1.000), recall of 1.000 (95% CI 1.000-1.000), mAP50 of 0.995 (95% CI 0.995-0.995), and mAP95 of 0.893 (95% CI 0.870-0.918); on the internal independent test set, precision and recall were 1.000, with mAP50 of 0.995 and mAP95 of 0.801; and on the external cross-population test set, precision and recall were 1.000, with mAP50 of 0.937 and mAP95 of 0.792. The control experiment reduced image preparation time from 10 hours for manual cropping of 900 photos to 30 seconds with the automated model. Downstream strabismus screening task validation showed our model (with head tilt correction) achieving an area under the curve of 0.917 (95% CI 0.901-0.933), surpassing Dlib-toolkit and faster R-CNN (both without head tilt correction) with an area under the curve of 0.856 (P=.02) and 0.884 (P=.05), respectively. Heatmaps highlighted core ocular focus, aligning with head tilt directions.

CONCLUSIONS

This study delivers an AI-driven platform featuring a preprocessing algorithm that automates eye region cropping, correcting head tilt variations to improve image quality for AI development and clinical use. Integrated with electronic archives and patient-physician interaction, it enhances workflow efficiency, ensures telemedicine privacy, and supports ophthalmological research and strabismus care.

摘要

背景

传统的眼部注视照片预处理依赖于手动裁剪和头部倾斜校正，既耗时又不一致，限制了人工智能（AI）模型的开发和临床应用。

目的

本研究旨在使用先进的预处理算法应对这些挑战，以提高临床工作流程和AI数据预处理中眼部区域裁剪的准确性、效率和标准化程度。

方法

这项回顾性和前瞻性横断面研究使用了来自648名住院患者和门诊患者的5832张图像，在包括遮挡和不同距离等多种条件下捕捉了3种注视位置。基于旋转边界框检测框架的预处理算法，使用不同交并比阈值下的精确率、召回率和平均精度均值（mAP）进行训练和评估。在一个住院患者数据集上进行了5折交叉验证，并在一个独立的门诊患者数据集和一个来自IMDB-WIKI集合的500张图像的外部跨人群数据集上进行了额外测试，该数据集代表了不同种族和年龄。专家验证确认了96张图像（斜视患者中文数据集的48张图像和IMDB-WIKI的48张图像）符合临床标准。使用梯度加权类激活映射热图来评估模型的可解释性。与5名验光专家进行的对照实验比较了手动和自动裁剪的效率。下游任务验证包括使用Dlib工具包、更快的基于区域的卷积神经网络（R-CNN；两者均无头部倾斜校正）以及我们的模型（有校正）对1000张主要注视照片进行预处理，通过视觉Transformer斜视筛查网络进行5折交叉验证来评估头部倾斜校正的影响。

结果

该模型在各个数据集上均表现出色：在5折交叉验证集上，其平均精确率为1.000（95%CI 1.000 - 1.000），召回率为1.000（95%CI 1.000 - 1.000），mAP50为0.995（95%CI 0.995 - 0.995），mAP95为0.893（95%CI 0.870 - 0.918）；在内部独立测试集上，精确率和召回率均为1.000，mAP50为0.995，mAP95为0.801；在外部跨人群测试集上，精确率和召回率均为1.000，mAP50为0.937，mAP95为0.792。对照实验将900张照片手动裁剪的图像准备时间从10小时减少到自动模型的30秒。下游斜视筛查任务验证表明，我们的模型（有头部倾斜校正）的曲线下面积为0.917（95%CI 0.901 - 0.933），超过了Dlib工具包和更快的R-CNN（两者均无头部倾斜校正），其曲线下面积分别为0.856（P = 0.02）和0.884（P = 0.05）。热图突出了眼部核心焦点，与头部倾斜方向一致。