Wang Nicholas Chandler, Kaplan Jeremy, Lee Joonsang, Hodgin Jeffrey, Udager Aaron, Rao Arvind
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Department of Pathology, University of Michigan Medical School, Ann Arbor, MI, USA.
J Pathol Inform. 2021 Dec 24;12:54. doi: 10.4103/jpi.jpi_6_21. eCollection 2021.
Machine learning models provide significant opportunities for improvement in health care, but their "black-box" nature poses many risks.
We built a custom Python module as part of a framework for generating artifacts that are meant to be tunable and describable to allow for future testing needs. We conducted an analysis of a previously published digital pathology classification model and an internally developed kidney tissue segmentation model, utilizing a variety of generated artifacts including testing their effects. The artifacts simulated were bubbles, tissue folds, uneven illumination, marker lines, uneven sectioning, altered staining, and tissue tears.
We found that there is some performance degradation on the tiles with artifacts, particularly with altered stains but also with marker lines, tissue folds, and uneven sectioning. We also found that the response of deep learning models to artifacts could be nonlinear.
Generated artifacts can provide a useful tool for testing and building trust in machine learning models by understanding where these models might fail.
机器学习模型为改善医疗保健提供了重大机遇,但其“黑箱”性质带来了许多风险。
我们构建了一个自定义Python模块,作为生成旨在可调整和可描述的工件的框架的一部分,以满足未来的测试需求。我们对先前发表的数字病理学分类模型和内部开发的肾组织分割模型进行了分析,利用了各种生成的工件,包括测试它们的效果。模拟的工件有气泡、组织褶皱、光照不均匀、标记线、切片不均匀、染色改变和组织撕裂。
我们发现,带有工件的切片存在一定程度的性能下降,特别是染色改变的情况,但标记线、组织褶皱和切片不均匀也会导致性能下降。我们还发现深度学习模型对工件的响应可能是非线性的。
通过了解机器学习模型可能在哪些方面失败,生成的工件可为测试和建立对这些模型的信任提供有用的工具。