University of Veterinary Medicine Vienna, Vienna, Austria.
Freie Universität Berlin, Berlin, Germany.
Vet Pathol. 2023 Jan;60(1):75-85. doi: 10.1177/03009858221137582. Epub 2022 Nov 17.
Exercise-induced pulmonary hemorrhage (EIPH) is a relevant respiratory disease in sport horses, which can be diagnosed by examination of bronchoalveolar lavage fluid (BALF) cells using the total hemosiderin score (THS). The aim of this study was to evaluate the diagnostic accuracy and reproducibility of annotators and to validate a deep learning-based algorithm for the THS. Digitized cytological specimens stained for iron were prepared from 52 equine BALF samples. Ten annotators produced a THS for each slide according to published methods. The reference methods for comparing annotator's and algorithmic performance included a ground truth dataset, the mean annotators' THSs, and chemical iron measurements. Results of the study showed that annotators had marked interobserver variability of the THS, which was mostly due to a systematic error between annotators in grading the intracytoplasmatic hemosiderin content of individual macrophages. Regarding overall measurement error between the annotators, 87.7% of the variance could be reduced by using standardized grades based on the ground truth. The algorithm was highly consistent with the ground truth in assigning hemosiderin grades. Compared with the ground truth THS, annotators had an accuracy of diagnosing EIPH (THS of < or ≥ 75) of 75.7%, whereas, the algorithm had an accuracy of 92.3% with no relevant differences in correlation with chemical iron measurements. The results show that deep learning-based algorithms are useful for improving reproducibility and routine applicability of the THS. For THS by experts, a diagnostic uncertainty interval of 40 to 110 is proposed. THSs within this interval have insufficient reproducibility regarding the EIPH diagnosis.
运动性肺出血(EIPH)是运动马的一种相关呼吸系统疾病,可以通过检查支气管肺泡灌洗液(BALF)细胞的总含铁血黄素评分(THS)来诊断。本研究的目的是评估注释者的诊断准确性和可重复性,并验证一种基于深度学习的 THS 算法。从 52 份马的 BALF 样本中制备了用于铁染色的数字化细胞学标本。根据已发表的方法,10 名注释者为每张幻灯片生成了一个 THS。用于比较注释者和算法性能的参考方法包括一个真实数据集、平均注释者的 THS 和化学铁测量值。研究结果表明,注释者的 THS 存在明显的观察者间变异性,这主要是由于注释者在对单个巨噬细胞的细胞内含铁血黄素含量进行分级时存在系统误差。关于注释者之间的整体测量误差,使用基于真实数据的标准化等级可以减少 87.7%的方差。该算法在分配含铁血黄素等级方面与真实数据高度一致。与真实数据的 THS 相比,注释者诊断 EIPH(THS<或≥75)的准确性为 75.7%,而算法的准确性为 92.3%,与化学铁测量值的相关性无显著差异。结果表明,基于深度学习的算法有助于提高 THS 的可重复性和常规适用性。对于专家 THS,建议诊断不确定性区间为 40 至 110。在此区间内的 THS 对于 EIPH 诊断的可重复性不足。