Nahass George R, Koehler Emma, Tomaras Nicholas, Lopez Danny, Cheung Madison, Palacios Alexander, Peterson Jeffrey C, Hubschman Sasha, Green Kelsey, Purnell Chad A, Setabutr Pete, Tran Ann Q, Yi Darvin
Department of Ophthalmology, University of Illinois Chicago College of Medicine, Chicago, Illinois.
Department of Biomedical Engineering, University of Illinois Chicago, Chicago, Illinois.
Ophthalmol Sci. 2025 Mar 5;5(4):100757. doi: 10.1016/j.xops.2025.100757. eCollection 2025 Jul-Aug.
We aimed to create and validate a dataset for oculoplastic segmentation and periorbital distance prediction.
This was an experimental study.
Images of faces from 2 open-source datasets were included in this study.
The images were sourced from 2 open-source datasets and cropped to include only the eyes. All images had the iris, sclera, lid, caruncle, and brow segmented by 5 trained annotators. Intergrader reliability analysis was done by having 5 annotators annotate the same 100 images randomly selected after at least a 2-week forgetting period. Intragrader analysis was done by having 5 annotators annotate the same 20 images after a 2-week forgetting period. Three DeepLabV3 segmentation models were trained for segmentation using the datasets following standard procedures.
The quality of the annotations was evaluated by Dice score through intragrader and intergrader experiments. Segmentation models were trained to demonstrate the dataset's utility for deep learning. The Dice score was used to evaluate deep learning models.
We annotated 2842 images. Agreement between annotators (intergrader) on a randomly selected subset of 100 images was very high, with an average Dice score of 0.82 ± 0.01. Intragrader analysis also demonstrates that the same grader accurately reproduces annotations with an average Dice score, across all classes, of 0.81 ± 0.08. The average Dice score across all classes of a segmentation network trained on the Chicago Facial dataset, the CelebAMask-HQ dataset, and both combined was 0.90 ± 0.11, 0.81 ± 0.20, and 0.84 ± 0.18, respectively.
We have developed a first-of-its-kind dataset for use in oculoplastic and craniofacial segmentation tasks. All the annotations are publicly available for free download. Having access to segmentation datasets designed specifically for oculoplastic surgery will permit more rapid development of clinically useful segmentation networks that can be leveraged for periorbital distance prediction and other downstream tasks. In addition to the annotations, we also provide an open-source toolkit for periorbital distance prediction from segmentation masks, which are available via an application programming interface. The weights of all models have also been open-sourced and are publicly available for use by the community.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
我们旨在创建并验证一个用于眼部整形分割和眶周距离预测的数据集。
这是一项实验性研究。
本研究纳入了来自2个开源数据集的面部图像。
图像来源于2个开源数据集,并进行裁剪,使其仅包含眼睛部分。所有图像的虹膜、巩膜、眼睑、泪阜和眉毛均由5名经过培训的标注人员进行分割。在至少2周的遗忘期后,让5名标注人员对随机选取的100张相同图像进行标注,以此进行评分者间可靠性分析。在2周的遗忘期后,让5名标注人员对相同的20张图像进行标注,以此进行评分者内分析。按照标准程序,使用这些数据集训练了3个深度卷积神经网络(DeepLabV3)分割模型用于分割。
通过评分者内和评分者间实验,利用骰子系数评估标注质量。训练分割模型以证明该数据集在深度学习中的效用。使用骰子系数评估深度学习模型。
我们标注了2842张图像。在随机选取的100张图像子集上,标注人员之间(评分者间)的一致性非常高,平均骰子系数为0.82±0.01。评分者内分析还表明,同一名评分者能够准确重现标注,所有类别的平均骰子系数为0.81±0.08。在芝加哥面部数据集、CelebAMask-HQ数据集以及两者合并数据集上训练的分割网络,所有类别的平均骰子系数分别为0.90±0.11、0.81±0.20和0.84±0.18。
我们开发了首个用于眼部整形和颅面分割任务的数据集。所有标注均可免费公开下载。能够获取专门为眼部整形手术设计的分割数据集,将有助于更快速地开发出临床上有用的分割网络,可用于眶周距离预测和其他下游任务。除了标注之外,我们还提供了一个用于从分割掩码预测眶周距离的开源工具包,可通过应用程序编程接口获取。所有模型的权重也已开源,可供社区公开使用。
在本文末尾的脚注和披露中可能会找到专有或商业披露信息。