Suppr超能文献

一种基于聚类的方法,用于解决在从MRI引导的前列腺立体定向体部放疗预测泌尿生殖系统毒性时的相关特征问题。

A clustering-based approach to address correlated features in predicting genitourinary toxicity from MRI-guided prostate SBRT.

作者信息

Rezapoor Pouyan, Pham Jonathan, Neilsen Beth, Liu Hengjie, Cao Minsong, Yang Yingli, Sheng Ke, Ma Ting Martin, Lamb James, Steinberg Michael, Kishan Amar U, Taylor Zachary, Ruan Dan

机构信息

Department of Electronics and Nanoengineering, Aalto University, Espoo, Finland.

Department of Radiation Oncology, University of California, Los Angeles, USA.

出版信息

Med Phys. 2025 Jun;52(6):5104-5114. doi: 10.1002/mp.17834. Epub 2025 Apr 23.

Abstract

BACKGROUND

It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high-dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias.

PURPOSE

This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI-guided stereotactic body radiation therapy (SBRT).

METHODS

Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill-conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT).

RESULTS

Extensive experiments show that introducing cluster-based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO-LR being slightly better, the cluster-based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control.

CONCLUSIONS

These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI-guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.

摘要

背景

在结果分析中,处理大量候选预后特征是很常见的。然而,如此高维的输入和相对较小的样本量会导致过拟合、低泛化性和相关偏差的风险。

目的

本研究解决了在预测接受MRI引导的立体定向体部放射治疗(SBRT)的前列腺癌患者泌尿生殖系统(GU)毒性的背景下减轻相关偏差的问题。

方法

典型的降维或特征选择方法包括用于稀疏性的正则化或信息准则。然而,当与输入特征(的子集)出现高度相关时,相关特征的分配权重可能会被稀释到相应特征在预测中不再有效的程度,从而导致次优的特征发现和预测。我们建议进行先进的层次聚类,然后将回归模型应用于聚类中心。这种方法解决了高维度和病态条件带来的挑战,并提高了所得预测模型的准确性和可靠性。在所提出的方法的性能在具有内在特征约简方法的典型回归模型上进行了评估,即最小绝对收缩和选择算子(LASSO)正则化逻辑回归(LR)、支持向量机(SVM)和决策树(DT)。

结果

大量实验表明,在公平的超参数调整条件下,引入基于聚类的特征压缩和表示可以改善所有回归模型。尽管具有聚类特征的LASSO和LR在训练和验证期间具有相似的性能,LASSO-LR略好一些,但基于聚类的特征方法在测试集上通过达到0.91的AUC和0.86的准确率实现了显著更好的性能,证明了其在稳定性和鲁棒性方面的优势。通过将特征聚类与五个代表与SVM相结合,实现了总体最佳测试性能。额外的相关性研究确定了紧密代表聚类中心的个体特征为2 Gy直肠时直肠的暴露体积、2 Gy和41 Gy时三角区的暴露、42 Gy尿道时尿道的暴露以及42 Gy直肠壁时直肠壁的暴露。这表明尿道、三角区和直肠壁的热点控制对于毒性控制的重要性。

结论

这些发现强调了聚类方法在减轻相关偏差和提高预测模型准确性方面的优越性。当前模型在预测MRI引导的前列腺SBRT中的GU毒性方面也达到了当前最优(SOTA)性能。将剂量特征与特征聚类中心相关联揭示了控制尿道、三角区和直肠壁的热点以降低毒性风险的重要性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验