Fritsch Virgile, Varoquaux Gael, Thyreau Benjamin, Poline Jean-Baptiste, Thirion Bertrand
Parietal Team, INRIA Saclay-Ile-de-France, Saclay, France.
Med Image Comput Comput Assist Interv. 2011;14(Pt 3):264-71. doi: 10.1007/978-3-642-23626-6_33.
Medical imaging datasets used in clinical studies or basic research often comprise highly variable multi-subject data. Statistically-controlled inclusion of a subject in a group study, i.e. deciding whether its images should be considered as samples from a given population or whether they should be rejected as outlier data, is a challenging issue. While the informal approaches often used do not provide any statistical assessment that a given dataset is indeed an outlier, traditional statistical procedures are not well-suited to the noisy, high-dimensional, settings encountered in medical imaging, e.g. with functional brain images. In this work, we modify the classical Minimum Covariance Determinant approach by adding a regularization term, that ensures that the estimation is well-posed in high-dimensional settings and in the presence of many outliers. We show on simulated and real data that outliers can be detected satisfactorily, even in situations where the number of dimensions of the data exceeds the number of observations.
临床研究或基础研究中使用的医学成像数据集通常包含高度可变的多主体数据。在群组研究中对受试者进行统计控制的纳入,即决定其图像应被视为来自给定总体的样本,还是应作为异常值数据被拒绝,是一个具有挑战性的问题。虽然常用的非正式方法没有提供任何关于给定数据集确实是异常值的统计评估,但传统统计程序并不适合医学成像中遇到的噪声大、维度高的情况,例如功能性脑图像。在这项工作中,我们通过添加一个正则化项来修改经典的最小协方差行列式方法,这确保了在高维环境和存在许多异常值的情况下估计是适定的。我们在模拟数据和真实数据上表明,即使在数据维度数超过观测数的情况下,也能令人满意地检测到异常值。