Department of Ophthalmology, Heinrich-Heine University Düsseldorf, Moorenstr. 5 40225, Düsseldorf, Germany.
Graefes Arch Clin Exp Ophthalmol. 2022 Aug;260(8):2605-2612. doi: 10.1007/s00417-022-05574-0. Epub 2022 Mar 31.
Corneal fluorescein staining is one of the most important diagnostic tests in dry eye disease (DED). Nevertheless, the result of this examination is depending on the grader. So far, there is no method for an automated quantification of corneal staining commercially available. Aim of this study was to develop a software-assisted grading algorithm and to compare it with a group of human graders with variable clinical experience in patients with DED.
Fifty images of eyes stained with 2 µl of 2% fluorescein presenting different severity of superficial punctate keratopathy in patients with DED were taken under standardized conditions. An algorithm for detecting and counting superficial punctate keratitis was developed using ImageJ with a training dataset of 20 randomly picked images. Then, the test dataset of 30 images was analyzed (1) by the ImageJ algorithm and (2) by 22 graders, all ophthalmologists with different levels of experience. All graders evaluated the images using the Oxford grading scheme for corneal staining at baseline and after 6-8 weeks. Intrarater agreement was also evaluated by adding a mirrored version of all original images into the set of images during the 2nd grading.
The count of particles detected by the algorithm correlated significantly (n = 30; p < 0.01) with the estimated true Oxford grade (Sr = 0,91). Overall human graders showed only moderate intrarater agreement (K = 0,426), while software-assisted grading was always the same (K = 1,0). Little difference was found between specialists and non-specialists in terms of intrarater agreement (K = 0,436 specialists; K = 0,417 non-specialists). The highest interrater agreement was seen with 75,6% in the most experienced grader, a cornea specialist with 29 years of experience, and the lowest was seen in a resident with 25,6% who had only 2 years of experience.
The variance in human grading of corneal staining - if only small - is likely to have only little impact on clinical management and thus seems to be acceptable. While human graders give results sufficient for clinical application, software-assisted grading of corneal staining ensures higher consistency and thus is preferrable for re-evaluating patients, e.g., in clinical trials.
角膜荧光素染色是干眼病(DED)最重要的诊断测试之一。然而,该检查的结果取决于判读者。到目前为止,还没有商品化的自动量化角膜染色的方法。本研究的目的是开发一种软件辅助分级算法,并将其与一组具有不同临床经验的人类判读者在 DED 患者中进行比较。
在标准化条件下,对 50 例患有 DED 的患者的眼睛滴入 2μl 2%荧光素,拍摄不同程度的浅层点状角膜病变。使用 ImageJ 开发了一种用于检测和计数浅层点状角膜炎的算法,使用 20 张随机挑选的图像进行了训练数据集。然后,分析了 30 张测试图像(1)通过 ImageJ 算法,(2)由 22 名判读者进行分析,所有判读者均为具有不同经验水平的眼科医生。所有判读者均使用牛津角膜染色分级方案在基线和 6-8 周后评估图像。通过在第二次分级时将所有原始图像的镜像版本添加到图像集中,还评估了组内判读者的一致性。
算法检测到的颗粒数与估计的真实牛津等级(Sr)显著相关(n=30;p<0.01)(Sr=0.91)。总体而言,人类判读者的组内判读一致性仅为中等(K=0.426),而软件辅助分级始终相同(K=1.0)。在组内判读一致性方面,专家和非专家之间几乎没有差异(K=0.436 名专家;K=0.417 名非专家)。在经验最丰富的判读者(一位具有 29 年经验的角膜专家)中,判读者之间的一致性最高,为 75.6%,而经验最少的住院医生(只有 2 年经验)的一致性最低,为 25.6%。
角膜染色的人类分级差异(如果有的话)很小,可能对临床管理的影响也很小,因此似乎是可以接受的。虽然人类判读者的结果足以满足临床应用,但角膜染色的软件辅助分级确保了更高的一致性,因此更适合重新评估患者,例如在临床试验中。