Traverso Alberto, Kazmierski Michal, Zhovannik Ivan, Welch Mattea, Wee Leonard, Jaffray David, Dekker Andre, Hope Andrew
Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, The Netherlands; Radiation Medicine Program, Princess Margaret Cancer Centre, Toronto, Canada.
Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre+, Maastricht, The Netherlands.
Phys Med. 2020 Mar;71:24-30. doi: 10.1016/j.ejmp.2020.02.010. Epub 2020 Feb 20.
Highlighting the risk of biases in radiomics-based models will help improve their quality and increase usage as decision support systems in the clinic. In this study we use machine learning-based methods to identify the presence of volume-confounding effects in radiomics features. Methods 841 radiomics features were extracted from two retrospective publicly available datasets of lung and head neck cancers using open source software. Unsupervised hierarchical clustering and principal component analysis (PCA) identified relations between radiomics and clinical outcomes (overall survival). Bootstrapping techniques with logistic regression verified features' prognostic power and robustness. Results Over 80% of the features had large pairwise correlations. Nearly 30% of the features presented strong correlations with tumor volume. Using volume-independent features for clustering and PCA did not allow risk stratification of patients. Clinical predictors outperformed radiomics features in bootstrapping and logistic regression. Conclusions The adoption of safeguards in radiomics is imperative to improve the quality of radiomics studies. We proposed machine learning (ML) - based methods for robust radiomics signatures development.
强调基于放射组学模型的偏倚风险将有助于提高其质量,并增加其作为临床决策支持系统的应用。在本研究中,我们使用基于机器学习的方法来识别放射组学特征中体积混杂效应的存在。方法 使用开源软件从两个公开可用的肺癌和头颈癌回顾性数据集中提取841个放射组学特征。无监督层次聚类和主成分分析(PCA)确定了放射组学与临床结局(总生存期)之间的关系。采用逻辑回归的自抽样技术验证了特征的预后能力和稳健性。结果 超过80%的特征具有较大的成对相关性。近30%的特征与肿瘤体积呈现强相关性。使用与体积无关的特征进行聚类和PCA无法对患者进行风险分层。在自抽样和逻辑回归中,临床预测指标优于放射组学特征。结论 在放射组学中采用保障措施对于提高放射组学研究的质量至关重要。我们提出了基于机器学习(ML)的方法来开发稳健的放射组学特征。