Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary.
Health Services Management Training Centre, Semmelweis University, Budapest, Hungary.
Sci Rep. 2022 Dec 9;12(1):21302. doi: 10.1038/s41598-022-23990-4.
Statistical learning algorithms strongly rely on an oversimplified assumption for optimal performance, that is, source (training) and target (testing) data are independent and identically distributed. Variation in human tissue, physician labeling and physical imaging parameters (PIPs) in the generative process, yield medical image datasets with statistics that render this central assumption false. When deploying models, new examples are often out of distribution with respect to training data, thus, training robust dependable and predictive models is still a challenge in medical imaging with significant accuracy drops common for deployed models. This statistical variation between training and testing data is referred to as domain shift (DS).To the best of our knowledge we provide the first empirical evidence that variation in PIPs between test and train medical image datasets is a significant driver of DS and model generalization error is correlated with this variance. We show significant covariate shift occurs due to a selection bias in sampling from a small area of PIP space for both inter and intra-hospital regimes. In order to show this, we control for population shift, prevalence shift, data selection biases and annotation biases to investigate the sole effect of the physical generation process on model generalization for a proxy task of age group estimation on a combined 44 k image mammogram dataset collected from five hospitals.We hypothesize that training data should be sampled evenly from PIP space to produce the most robust models and hope this study provides motivation to retain medical image generation metadata that is almost always discarded or redacted in open source datasets. This metadata measured with standard international units can provide a universal regularizing anchor between distributions generated across the world for all current and future imaging modalities.
统计学习算法强烈依赖于一个过于简化的假设,即源(训练)和目标(测试)数据是独立同分布的。在生成过程中,人类组织、医生标记和物理成像参数(PIP)的变化导致医学图像数据集的统计数据呈现出这种中心假设是错误的。在部署模型时,新的示例通常与训练数据分布不同,因此,训练稳健、可靠和可预测的模型仍然是医学成像中的一个挑战,部署的模型通常会出现显著的准确性下降。这种训练数据和测试数据之间的统计差异被称为领域转移(Domain Shift,DS)。据我们所知,我们首次提供了经验证据,证明 PIP 测试和训练医学图像数据集之间的变化是 DS 的一个重要驱动因素,并且模型泛化误差与这种方差相关。我们表明,由于在医院内和医院间的规则中,从 PIP 空间的一个小区域中进行采样会产生选择偏差,因此会发生显著的协变量转移。为了证明这一点,我们控制了人群转移、患病率转移、数据选择偏差和注释偏差,以调查仅物理生成过程对模型泛化的影响,这是在从五个医院收集的一个包含 44k 张图像的乳腺 X 光数据集上进行的年龄组估计代理任务。我们假设训练数据应该从 PIP 空间中均匀采样,以生成最稳健的模型,并希望本研究提供动力,保留几乎总是在开源数据集中丢弃或编辑的医学图像生成元数据。用标准国际单位测量的这种元数据可以为全球范围内生成的分布之间提供一个通用的正则化锚点,适用于所有当前和未来的成像模式。