Ling Tong, Zuo Zhichao, Huang Mingwei, Wu Liucheng, Ma Jie, Huang Xiaoliang, Tang Weizhong
Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China.
School of Mathematics and Computational Science, Xiangtan University, Xiangtan, China.
Abdom Radiol (NY). 2025 Jul;50(7):2794-2805. doi: 10.1007/s00261-024-04743-5. Epub 2024 Dec 12.
Endoscopic biopsy diagnosis for the preoperative assessment of mucinous components in patients with colorectal cancer is limited. This study investigated a radiomics model and established an explainable prediction model by using machine learning to differentiate between adenocarcinoma with mucinous components and mucinous adenocarcinoma.
The derivation cohort included 312 patients with colorectal cancer with mucinous components detected during preoperative endoscopic biopsy diagnosis. These patients were randomly divided into training and validation sets in a 7:3 ratio. Radiomics features were extracted, followed by feature engineering, to create a radiomic score (radscore). Subsequently, 24 features, including the radscore, clinical data, and serological characteristics, were used to develop machine learning models by using nine different machine learning algorithms. The SHapley Additive exPlanation (SHAP) method was employed to elucidate the workings of the machine learning models and visualize individual variable predictions.
The radiomics model achieved an area under the curve (AUC) of 0.810. The random forest model outperformed the other models and had the highest AUC of 0.832; thus, this model was defined as the hybrid model. The clinical model, which was built using clinical data and serological characteristics, had an AUC of 0.732, whereas the radiomics model achieved an AUC of 0.810. SHAP model interpretation revealed that among the 14 features with non-zero SHAP values, the radscore and clinical T stage had notably higher values.
This interpretable predictive model effectively differentiates between adenocarcinoma with mucinous components and mucinous adenocarcinoma in patients with colorectal cancer, thereby facilitating informed treatment decisions for individuals in whom mucinous components are identified during preoperative biopsy diagnosis.
内镜活检诊断在结直肠癌患者黏液成分术前评估中的作用有限。本研究调查了一种放射组学模型,并通过机器学习建立了一种可解释的预测模型,以区分伴有黏液成分的腺癌和黏液腺癌。
推导队列包括312例在术前内镜活检诊断中检测到有黏液成分的结直肠癌患者。这些患者以7:3的比例随机分为训练集和验证集。提取放射组学特征,随后进行特征工程,以创建放射组学评分(radscore)。随后,使用包括radscore、临床数据和血清学特征在内的24个特征,通过9种不同的机器学习算法开发机器学习模型。采用SHapley加法解释(SHAP)方法来阐明机器学习模型的工作原理,并可视化个体变量预测。
放射组学模型的曲线下面积(AUC)为0.810。随机森林模型优于其他模型,AUC最高,为0.832;因此,该模型被定义为混合模型。使用临床数据和血清学特征构建的临床模型的AUC为0.732,而放射组学模型的AUC为0.810。SHAP模型解释显示,在14个非零SHAP值的特征中,radscore和临床T分期的值明显更高。
这种可解释的预测模型有效地区分了结直肠癌患者中伴有黏液成分的腺癌和黏液腺癌,从而有助于为术前活检诊断中发现有黏液成分的个体做出明智的治疗决策。