Zhao Xingchen, Sicilia Anthony, Minhas Davneet S, O'Connor Erin E, Aizenstein Howard J, Klunk William E, Tudorascu Dana L, Hwang Seong Jae
Department of Computer Science, University of Pittsburgh.
Intelligent Systems Program - University of Pittsburgh.
Proc IEEE Int Symp Biomed Imaging. 2021 Apr;2021:1047-1051. doi: 10.1109/ISBI48211.2021.9434034. Epub 2021 May 25.
Typical machine learning frameworks heavily rely on an underlying assumption that training and test data follow the same distribution. In medical imaging which increasingly begun acquiring datasets from multiple sites or scanners, this identical distribution assumption often fails to hold due to systematic variability induced by site or scanner dependent factors. Therefore, we cannot simply expect a model trained on a given dataset to consistently work well, or generalize, on a dataset from another distribution. In this work, we address this problem, investigating the application of machine learning models to unseen medical imaging data. Specifically, we consider the challenging case of Domain Generalization (DG) where we train a model without any knowledge about the testing distribution. That is, we train on samples from a set of distributions (sources) and test on samples from a new, unseen distribution (target). We focus on the task of white matter hyperintensity (WMH) prediction using the multi-site WMH Segmentation Challenge dataset and our local in-house dataset. We identify how two mechanically distinct DG approaches, namely domain adversarial learning and mix-up, have theoretical synergy. Then, we show drastic improvements of WMH prediction on an unseen target domain.
典型的机器学习框架严重依赖一个基本假设,即训练数据和测试数据遵循相同的分布。在医学成像领域,越来越多地开始从多个站点或扫描仪获取数据集,由于站点或扫描仪相关因素引起的系统变异性,这种相同分布假设往往不成立。因此,我们不能简单地期望在给定数据集上训练的模型在来自另一个分布的数据集上始终表现良好或具有泛化能力。在这项工作中,我们解决这个问题,研究机器学习模型在未见医学成像数据上的应用。具体来说,我们考虑领域泛化(DG)这一具有挑战性的情况,即我们在对测试分布一无所知的情况下训练模型。也就是说,我们在来自一组分布(源)的样本上进行训练,并在来自一个新的、未见分布(目标)的样本上进行测试。我们使用多站点白质高信号(WMH)分割挑战赛数据集和我们本地的内部数据集,专注于白质高信号(WMH)预测任务。我们确定了两种机制上不同的DG方法,即领域对抗学习和混合训练,如何具有理论协同作用。然后,我们展示了在未见目标域上WMH预测的显著改进。