Fontaine Pierre, Riet Francois-Georges, Castelli Joel, Gnep Khemara, Depeursinge Adrien, Crevoisier Renaud De, Acosta Oscar
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:1667-1670. doi: 10.1109/EMBC44109.2020.9176724.
Hepatocellular carcinoma (HCC) is the sixth more frequent cancer worldwide. This type of cancer has a poor overall survival rate mainly due to underlying cirrhosis and risk of recurrence outside the treated lesion. Quantitative imaging within a radiomics workflow may help assessing the probability of survival and potentially may allow tailoring personalized treatments. In radiomics a large amount of features can be extracted, which may be correlated across a population and very often can be surrogates of the same physiopathology. This issues are more pronounced and difficult to tackle with imbalanced data. Feature selection strategies are therefore required to extract the most informative with the increased predictive capabilities. In this paper, we compared different unsupervised and supervised strategies for feature selection in presence of imbalanced data and optimize them within a machine learning framework. Multi-parametric Magnetic Resonance Images from 81 individuals (19 deceased) treated with stereotactic body radiation therapy (SBRT) for inoperable (HCC) were analyzed. Pre-selection of a reduced set of features based on Affinity Propagation clustering (non supervised) achieved a significant improvement in AUC compared to other approaches with and without feature pre-selection. By including the synthetic minority over-sampling technique (SMOTE) for imbalanced data and Random Forest classification this workflow emerges as an appealing feature selection strategy for survival prediction within radiomics studies.
肝细胞癌(HCC)是全球第六大常见癌症。这类癌症的总体生存率较低,主要原因是潜在的肝硬化以及治疗病变外复发的风险。放射组学工作流程中的定量成像可能有助于评估生存概率,并有可能实现个性化治疗的定制。在放射组学中,可以提取大量特征,这些特征在人群中可能具有相关性,并且通常可以作为相同生理病理学的替代指标。在数据不均衡的情况下,这些问题更加突出且难以解决。因此,需要特征选择策略来提取最具信息性且具有增强预测能力的特征。在本文中,我们比较了在数据不均衡情况下不同的无监督和监督特征选择策略,并在机器学习框架内对其进行优化。分析了81例接受立体定向体部放射治疗(SBRT)的不可切除肝细胞癌患者(19例死亡)的多参数磁共振图像。与其他有无特征预选择的方法相比,基于亲和传播聚类(无监督)对一组减少的特征进行预选择在AUC方面有显著改善。通过纳入用于不均衡数据的合成少数过采样技术(SMOTE)和随机森林分类,该工作流程成为放射组学研究中一种有吸引力的生存预测特征选择策略。