Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, 94158, USA.
Institute for Neurodegenerative Diseases, University of California, San Francisco, CA, 94158, USA.
Acta Neuropathol Commun. 2022 Apr 28;10(1):66. doi: 10.1186/s40478-022-01365-0.
Pathologists can label pathologies differently, making it challenging to yield consistent assessments in the absence of one ground truth. To address this problem, we present a deep learning (DL) approach that draws on a cohort of experts, weighs each contribution, and is robust to noisy labels. We collected 100,495 annotations on 20,099 candidate amyloid beta neuropathologies (cerebral amyloid angiopathy (CAA), and cored and diffuse plaques) from three institutions, independently annotated by five experts. DL methods trained on a consensus-of-two strategy yielded 12.6-26% improvements by area under the precision recall curve (AUPRC) when compared to those that learned individualized annotations. This strategy surpassed individual-expert models, even when unfairly assessed on benchmarks favoring them. Moreover, ensembling over individual models was robust to hidden random annotators. In blind prospective tests of 52,555 subsequent expert-annotated images, the models labeled pathologies like their human counterparts (consensus model AUPRC = 0.74 cored; 0.69 CAA). This study demonstrates a means to combine multiple ground truths into a common-ground DL model that yields consistent diagnoses informed by multiple and potentially variable expert opinions.
病理学家可能会对病理学进行不同的标注,导致在缺乏一个统一标准的情况下,难以得出一致的评估结果。为了解决这个问题,我们提出了一种深度学习(DL)方法,该方法利用了一组专家的意见,权衡每个意见的贡献,并对有噪声的标签具有鲁棒性。我们从三个机构收集了 100495 个关于 20099 个候选淀粉样蛋白β病理学(脑淀粉样血管病(CAA)、核心和弥漫性斑块)的注释,这些注释由五位专家独立进行了标注。与那些学习个体化注释的方法相比,基于共识的两种策略训练的 DL 方法在精度召回曲线下面积(AUPRC)方面提高了 12.6-26%。当在有利于它们的基准上进行不公平评估时,该策略甚至超过了个别专家模型。此外,个体模型的集成对隐藏的随机注释者具有鲁棒性。在对随后的 52555 个专家标注图像进行的盲前瞻性测试中,这些模型对病理学的标注与人类专家类似(共识模型 AUPRC=0.74 核心;0.69 CAA)。这项研究展示了一种将多个标注整合到一个共同标注的深度学习模型中的方法,该方法可以根据多个可能变化的专家意见得出一致的诊断。