Fell Christina, Mohammadi Mahnaz, Morrison David, Arandjelovic Ognjen, Caie Peter, Harris-Birtill David
School of Computer Science, University of St Andrews, St Andrews, United Kingdom.
Indica Labs, Albuquerque, New Mexico, United States of America.
PLOS Digit Health. 2022 Dec 2;1(12):e0000145. doi: 10.1371/journal.pdig.0000145. eCollection 2022 Dec.
For a method to be widely adopted in medical research or clinical practice, it needs to be reproducible so that clinicians and regulators can have confidence in its use. Machine learning and deep learning have a particular set of challenges around reproducibility. Small differences in the settings or the data used for training a model can lead to large differences in the outcomes of experiments. In this work, three top-performing algorithms from the Camelyon grand challenges are reproduced using only information presented in the associated papers and the results are then compared to those reported. Seemingly minor details were found to be critical to performance and yet their importance is difficult to appreciate until the actual reproduction is attempted. We observed that authors generally describe the key technical aspects of their models well but fail to maintain the same reporting standards when it comes to data preprocessing which is essential to reproducibility. As an important contribution of the present study and its findings, we introduce a reproducibility checklist that tabulates information that needs to be reported in histopathology ML-based work in order to make it reproducible.
要使一种方法在医学研究或临床实践中得到广泛应用,它必须具有可重复性,以便临床医生和监管机构能够对其使用充满信心。机器学习和深度学习在可重复性方面存在一系列特殊挑战。训练模型时所使用的设置或数据中的微小差异可能会导致实验结果出现巨大差异。在这项工作中,仅使用相关论文中呈现的信息就复现了来自Camelyon重大挑战的三种性能最佳的算法,然后将结果与所报告的结果进行比较。结果发现,看似微不足道的细节对性能至关重要,但在实际尝试复现之前,其重要性很难被认识到。我们观察到,作者通常能很好地描述其模型的关键技术方面,但在涉及对可重复性至关重要的数据预处理时,却未能保持相同的报告标准。作为本研究及其发现的一项重要贡献,我们引入了一份可重复性清单,该清单将基于组织病理学机器学习的工作中需要报告的信息制成表格,以便使其具有可重复性。