Wilbur David C, Smith Maxwell L, Cornell Lynn D, Andryushkin Alexander, Pettus Jason R
Corista LLC, Concord, MA, USA.
Department of Laboratory Medicine and Pathology, Mayo Clinic - Scottsdale, Phoenix, AZ, USA.
Histopathology. 2021 Oct;79(4):499-508. doi: 10.1111/his.14376. Epub 2021 Aug 3.
Machine learning in digital pathology can improve efficiency and accuracy via prescreening with automated feature identification. Studies using uniform histological material have shown promise. Generalised application requires validation on slides from multiple institutions. We used machine learning to identify glomeruli on renal biopsies and compared performance between single and multiple institutions.
Randomly selected, adequately sampled renal core biopsy cases (71) consisting of four stains each (haematoxylin and eosin, trichrome, silver, periodic acid Schiff) from three institutions were digitised at ×40. Glomeruli were manually annotated by three renal pathologists using a digital tool. Cases were divided into training/validation (n = 52) and evaluation (n = 19) cohorts. An algorithm was trained to develop three convolutional neural network (CNN) models which tested case cohorts intra- and inter-institutionally. Raw CNN search data from each of the four slides per case were merged into composite regions of interest containing putative glomeruli. The sensitivity and modified specificity of glomerulus detection (versus annotated truth) were calculated for each model/cohort. Intra-institutional (3) sensitivity ranged from 90 to 93%, with modified specificity from 86 to 98%. Interinstitutional (1) sensitivity was 77%, with modified specificity 97%. Combined intra- and inter-institutional (1) sensitivity was 86%, with modified specificity 92%.
Feature detection sensitivity degrades when training and test material originate from different sites. Training using a combined set of digital slides from three institutions improves performance. Differing histology methods probably account for algorithm performance contrasts. Our data highlight the need for diverse training sets for the development of generalisable machine learning histology algorithms.
数字病理学中的机器学习可通过自动特征识别进行预筛选,从而提高效率和准确性。使用统一组织学材料的研究已显示出前景。广泛应用需要在来自多个机构的玻片上进行验证。我们使用机器学习来识别肾活检中的肾小球,并比较了单机构和多机构之间的性能。
从三个机构随机选取71例充分采样的肾芯活检病例,每个病例有四种染色(苏木精和伊红、三色染色、银染色、过碘酸希夫染色),在40倍下进行数字化处理。三位肾脏病理学家使用数字工具对肾小球进行手动标注。病例分为训练/验证组(n = 52)和评估组(n = 19)。训练一种算法以开发三个卷积神经网络(CNN)模型,这些模型在机构内和机构间对病例组进行测试。每个病例四张玻片的原始CNN搜索数据被合并到包含假定肾小球的复合感兴趣区域中。计算每个模型/病例组的肾小球检测敏感性和改良特异性(相对于标注真值)。机构内(3个)敏感性范围为90%至93%,改良特异性为86%至98%。机构间(1个)敏感性为77%,改良特异性为97%。机构内和机构间联合(1个)敏感性为86%,改良特异性为92%。
当训练和测试材料来自不同地点时,特征检测敏感性会降低。使用来自三个机构的一组数字玻片进行训练可提高性能。不同的组织学方法可能是算法性能差异的原因。我们的数据强调了开发通用机器学习组织学算法需要多样化的训练集。