Suppr超能文献

深度学习数字病理学中的隐藏变量及其导致批次效应的潜在可能性:预测模型研究。

Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study.

机构信息

Digital Biomarkers for Oncology Group, National Center for Tumor Diseases, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Institute of Pathology, University Hospital Heidelberg, University of Heidelberg, Heidelberg, Germany.

出版信息

J Med Internet Res. 2021 Feb 2;23(2):e23436. doi: 10.2196/23436.

Abstract

BACKGROUND

An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems.

OBJECTIVE

The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects.

METHODS

We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95% confidence interval of its mean balanced accuracy was above 50.0%.

RESULTS

A mean balanced accuracy above 50.0% was achieved for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed wide variation, ranging from 56.1% (slide preparation date) to 100% (slide origin).

CONCLUSIONS

Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification.

摘要

背景

越来越多的数字病理学研究表明,人工智能(AI)有潜力使用组织学全切片图像诊断癌症,这需要大量和多样化的数据集。虽然多样化可能会导致更具通用性的基于 AI 的系统,但也可能引入隐藏变量。如果神经网络能够区分/学习隐藏变量,这些变量可能会引入批次效应,从而影响分类系统的准确性。

目的

本研究旨在分析在数字病理学中常见的全切片图像数据集(患者年龄、切片制备日期、切片来源和扫描仪类型)中可能产生批次效应的隐藏变量(隐变量)的可学习性。

方法

我们使用来自五个不同研究所的数字化全切片黑色素瘤图像数据集,训练了四个独立的卷积神经网络(CNN)来学习四个变量。为了提高鲁棒性,每个 CNN 的训练和评估运行都重复了多次,并且只有当一个变量的平均平衡准确率 95%置信区间下限大于 50.0%时,才认为该变量是可学习的。

结果

即使考虑到 95%置信区间的下限,所有四个任务的平均平衡准确率都达到了 50.0%以上。任务之间的性能差异很大,从 56.1%(切片制备日期)到 100%(切片来源)不等。

结论

由于所有分析的隐藏变量都是可学习的,它们有可能在皮肤病理学数据集中产生批次效应,从而对基于 AI 的分类系统产生负面影响。在开发和评估此类系统时,从业者应该意识到这些问题以及类似的陷阱,并通过充分的数据分层来解决这些和潜在的其他批次效应变量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0115/7886613/04440607f1a7/jmir_v23i2e23436_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验