Center for Biomedical Image Computing and Analysis (CBICA), Department of Radiology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Sci Rep. 2022 Nov 8;12(1):19009. doi: 10.1038/s41598-022-23328-0.
Radiomic approaches in precision medicine are promising, but variation associated with image acquisition factors can result in severe biases and low generalizability. Multicenter datasets used in these studies are often heterogeneous in multiple imaging parameters and/or have missing information, resulting in multimodal radiomic feature distributions. ComBat is a promising harmonization tool, but it only harmonizes by single/known variables and assumes standardized input data are normally distributed. We propose a procedure that sequentially harmonizes for multiple batch effects in an optimized order, called OPNested ComBat. Furthermore, we propose to address bimodality by employing a Gaussian Mixture Model (GMM) grouping considered as either a batch variable (OPNested + GMM) or as a protected clinical covariate (OPNested - GMM). Methods were evaluated on features extracted with CapTK and PyRadiomics from two public lung computed tomography (CT) datasets. We found that OPNested ComBat improved harmonization performance over standard ComBat. OPNested + GMM ComBat exhibited the best harmonization performance but the lowest predictive performance, while OPNested - GMM ComBat showed poorer harmonization performance, but the highest predictive performance. Our findings emphasize that improved harmonization performance is no guarantee of improved predictive performance, and that these methods show promise for superior standardization of datasets heterogeneous in multiple or unknown imaging parameters and greater generalizability.
精准医学中的放射组学方法很有前景,但与图像采集因素相关的变化可能导致严重的偏差和低泛化能力。这些研究中使用的多中心数据集在多个成像参数和/或具有缺失信息方面往往存在异质性,导致多模态放射组学特征分布。ComBat 是一种很有前途的调和工具,但它只能通过单一/已知变量进行调和,并假设标准化输入数据呈正态分布。我们提出了一种顺序优化多个批次效应的调和程序,称为 OPNested ComBat。此外,我们建议通过使用高斯混合模型 (GMM) 分组来解决双峰性,该分组被视为批次变量(OPNested+GMM)或受保护的临床协变量(OPNested-GMM)。我们使用 CapTK 和 PyRadiomics 从两个公开的肺部 CT 数据集提取特征来评估方法。我们发现,OPNested ComBat 提高了 ComBat 的调和性能。OPNested+GMM ComBat 表现出最佳的调和性能,但预测性能最低,而 OPNested-GMM ComBat 表现出较差的调和性能,但预测性能最高。我们的研究结果强调,提高调和性能并不能保证提高预测性能,并且这些方法在对具有多个或未知成像参数的数据集进行更好的标准化和更高的泛化能力方面显示出了很大的前景。