Yamashita Michifumi, Piaseczna Natalia, Takahashi Akira, Kiyozawa Daisuke, Tatsumoto Narihito, Kaneko Shohei, Zurek Natalia, Gertych Arkadiusz
Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States.
Faculty of Biomedical Engineering, Silesian University of Technology, Zabrze, Poland; Innovation Centre for Digital Medicine, National Information Processing Institute, Warsaw, Poland.
Comput Methods Programs Biomed. 2025 Aug;268:108842. doi: 10.1016/j.cmpb.2025.108842. Epub 2025 May 7.
Measuring the thickness of the glomerular basement membrane (GBM) and assessing the percentage of podocyte foot process effacement (%PFPE) are important for diagnosing non-neoplastic kidney diseases. However, when performed manually by nephropathologists using electron microscopy (EM) images, these assessments are hindered by the lack of universally standardized guidelines, leading to technical challenges. We have developed a novel deep learning (DL)-based pipeline which has the potential to reduce human error and enhance the consistency and efficiency of GBMs and %PFPE quantifications.
This study utilized 196 EM images from kidney biopsies (representing 21 different kidney diseases from 83 subjects) which were manually annotated by consensus of 3 nephrologists and 2 nephropathologist providing ground truth (GT) masks of GBMs, podocytes, red blood cells and other glomerular ultrastructures. Of these, 165 images were used to develop two DL models (DeepLabV3+ and U-Net architectures) for EM image segmentation. Subsequently, the models were evaluated on the remaining 31 images and compared for segmentation accuracy, and the predicted GBM and podocyte masks were analyzed by algorithms in the pipeline which automatically measured the corrected harmonic mean of GBM thickness (cmGBM) and estimated the %PFPE. The automated measurements were statistically compared to the corresponding cmGBM measured and %PFPE estimated using the consensus GBM and podocyte GT masks. The goal was to identify differences between measurements provided by these three methods. Statistical evaluations were carried out using the intraclass correlation coefficient (ICC), and the Bland-Altman plots estimating the bias and limits of agreement (LoAs) between the GT and DL mask-based measurements.
In the 31 test set images, the DeepLabV3+ model achieved a global accuracy (gACC) of 92.8 % and a weighted intersection over union (wIoU) of 0.869, outperforming the U-Net model, which recorded a gACC of 88.9 % and a wIoU of 0.800. For GBM thickness measurements, the cmGBM derived from DeepLabV3+ masks exhibited excellent agreement with GT-masks based measurements (ICC = 0.991, p < 0.001), whereas the U-Net model showed good agreement (ICC = 0.881, p < 0.001). The %PFPE estimates obtained using the DL-generated podocyte masks were highly consistent with those based on GT, with ICC values of 0.926 and 0.928 for DeepLabV3+ and U-Net, respectively. The Bland-Altman plots revealed a positive bias in the cmGBM and %PFPE obtained from the masks generated by the DeepLabV3+ model, and negative bias in the cmGBM and %PFPE obtained from the masks generated by the U-Net model. However, the DeepLabV3+ masks provided narrower LoA ranges than the U-Net masks for measuring cmGBM.
This study highlights the potential of AI to address the limitations of manual assessments of glomerular ultrastructures in EM images by providing comprehensive, objective and accurate measurements of GBM thickness and %PFPE estimates. Our pipeline with DeepLabV3+ demonstrated robust EM image segmentation efficiency and excellent reliability of measurements when compared to expert ground truth. Further refinement of this AI-driven method for advancing the diagnostic capabilities and standardization of AI in nephropathology is warranted.
测量肾小球基底膜(GBM)厚度及评估足细胞足突消失百分比(%PFPE)对于非肿瘤性肾脏疾病的诊断至关重要。然而,当肾病理学家使用电子显微镜(EM)图像进行手动评估时,由于缺乏统一标准化的指南,这些评估受到阻碍,导致技术挑战。我们开发了一种基于深度学习(DL)的新型流程,其有潜力减少人为误差并提高GBM厚度及%PFPE量化的一致性和效率。
本研究利用了196张来自肾脏活检的EM图像(代表来自83名受试者的21种不同肾脏疾病),这些图像由3名肾内科医生和2名肾病理学家达成共识进行手动标注,提供GBM、足细胞、红细胞及其他肾小球超微结构的真实标准(GT)掩码。其中,165张图像用于开发两个用于EM图像分割的DL模型(DeepLabV3+和U-Net架构)。随后,在其余31张图像上对模型进行评估并比较分割准确性,通过流程中的算法分析预测的GBM和足细胞掩码,该算法自动测量GBM厚度的校正调和均值(cmGBM)并估计%PFPE。将自动测量结果与使用共识GBM和足细胞GT掩码测量的相应cmGBM及估计的%PFPE进行统计学比较。目的是确定这三种方法提供的测量结果之间的差异。使用组内相关系数(ICC)进行统计评估,并使用Bland-Altman图估计GT与基于DL掩码的测量之间的偏差和一致性界限(LoA)。
在31张测试集图像中,DeepLabV3+模型实现了92.8%的全局准确率(gACC)和0.869的加权交并比(wIoU),优于U-Net模型,后者的gACC为88.9%,wIoU为0.800。对于GBM厚度测量,从DeepLabV3+掩码得出的cmGBM与基于GT掩码的测量结果表现出极好的一致性(ICC = 0.991, p < 0.001),而U-Net模型显示出良好的一致性(ICC = 0.881, p < 0.001)。使用DL生成的足细胞掩码获得的%PFPE估计值与基于GT的估计值高度一致,DeepLabV3+和U-Net的ICC值分别为0.926和0.928。Bland-Altman图显示,从DeepLabV3+模型生成的掩码获得的cmGBM和%PFPE存在正偏差,而从U-Net模型生成的掩码获得的cmGBM和%PFPE存在负偏差。然而,在测量cmGBM时,DeepLabV3+掩码提供的LoA范围比U-Net掩码更窄。
本研究强调了人工智能通过提供GBM厚度和%PFPE估计的全面、客观和准确测量来解决EM图像中肾小球超微结构手动评估局限性的潜力。与专家真实标准相比,我们基于DeepLabV3+的流程展示了强大的EM图像分割效率和出色的测量可靠性。有必要进一步完善这种人工智能驱动的方法,以提高肾病理学中人工智能的诊断能力和标准化水平。