Department of Pathology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois.
Department of Pathology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois.
Mod Pathol. 2024 Mar;37(3):100422. doi: 10.1016/j.modpat.2024.100422. Epub 2024 Jan 6.
Machine learning (ML) models are poised to transform surgical pathology practice. The most successful use attention mechanisms to examine whole slides, identify which areas of tissue are diagnostic, and use them to guide diagnosis. Tissue contaminants, such as floaters, represent unexpected tissue. Although human pathologists are extensively trained to consider and detect tissue contaminants, we examined their impact on ML models. We trained 4 whole-slide models. Three operate in placenta for the following functions: (1) detection of decidual arteriopathy, (2) estimation of gestational age, and (3) classification of macroscopic placental lesions. We also developed a model to detect prostate cancer in needle biopsies. We designed experiments wherein patches of contaminant tissue are randomly sampled from known slides and digitally added to patient slides and measured model performance. We measured the proportion of attention given to contaminants and examined the impact of contaminants in the t-distributed stochastic neighbor embedding feature space. Every model showed performance degradation in response to one or more tissue contaminants. Decidual arteriopathy detection--balanced accuracy decreased from 0.74 to 0.69 ± 0.01 with addition of 1 patch of prostate tissue for every 100 patches of placenta (1% contaminant). Bladder, added at 10% contaminant, raised the mean absolute error in estimating gestational age from 1.626 weeks to 2.371 ± 0.003 weeks. Blood, incorporated into placental sections, induced false-negative diagnoses of intervillous thrombi. Addition of bladder to prostate cancer needle biopsies induced false positives, a selection of high-attention patches, representing 0.033 mm, and resulted in a 97% false-positive rate when added to needle biopsies. Contaminant patches received attention at or above the rate of the average patch of patient tissue. Tissue contaminants induce errors in modern ML models. The high level of attention given to contaminants indicates a failure to encode biological phenomena. Practitioners should move to quantify and ameliorate this problem.
机器学习 (ML) 模型有望改变外科病理学实践。最成功的模型使用注意力机制来检查整个幻灯片,识别哪些组织区域具有诊断意义,并使用这些信息来指导诊断。组织污染物,如漂浮物,代表了意外的组织。尽管人类病理学家经过广泛的培训来考虑和检测组织污染物,但我们还是检查了它们对 ML 模型的影响。我们训练了 4 个全幻灯片模型。其中 3 个用于胎盘,具有以下功能:(1) 检测蜕膜血管病变,(2) 估计胎龄,和 (3) 分类胎盘的宏观病变。我们还开发了一个模型来检测前列腺癌在针吸活检中的存在。我们设计了实验,其中随机从已知幻灯片中抽取污染物组织的补丁,并将其数字添加到患者幻灯片中,然后测量模型性能。我们测量了注意力对污染物的比例,并研究了污染物在 t 分布随机邻居嵌入特征空间中的影响。每个模型都表现出对一种或多种组织污染物的性能下降。蜕膜血管病变的检测-在添加 100 个胎盘补丁中加入 1 个前列腺组织补丁时,平衡准确率从 0.74 下降到 0.69 ± 0.01。在 10%污染物的情况下添加膀胱会使胎龄估计的平均绝对误差从 1.626 周增加到 2.371 ± 0.003 周。在胎盘切片中添加血液会导致绒毛间血栓的假阴性诊断。将膀胱添加到前列腺癌针吸活检中会导致假阳性,选择高注意力的补丁,代表 0.033 毫米,并在添加到针吸活检中时导致 97%的假阳性率。污染物补丁的注意力达到或高于患者组织平均补丁的注意力。组织污染物会导致现代 ML 模型出现错误。对污染物的高度关注表明模型未能编码生物学现象。从业者应着手量化和改善这个问题。