García-Barceló Carmen, Gil David, Tomás David, Bernabeu David
University Institute for Computer Research, University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 San Vicente del Raspeig, Spain.
Sensors (Basel). 2025 Jul 4;25(13):4184. doi: 10.3390/s25134184.
One of the main issues with paragangliomas and pheochromocytomas is that these tumors have up to a 20% rate of metastatic disease, which cannot be reliably predicted. While machine learning models hold great promise for enhancing predictive accuracy, their often opaque nature limits trust and adoption in critical fields such as healthcare. Understanding the factors driving predictions is essential not only for validating their reliability but also for enabling their integration into clinical decision-making. In this paper, we propose an architecture that combines data mining, machine learning, and explainability techniques to improve predictions of metastatic disease in these types of cancer and enhance trust in the models. A wide variety of algorithms have been applied for the development of predictive models, with a focus on interpreting their outputs to support clinical insights. Our methodology involves a comprehensive preprocessing phase to prepare the data, followed by the application of classification algorithms. Explainability techniques were integrated to provide insights into the key factors driving predictions. Additionally, a feature selection process was performed to identify the most influential variables and explore how their inclusion affects model performance. The best-performing algorithm, Random Forest, achieved an accuracy of 96.3%, precision of 96.5%, and AUC of 0.963, among other metrics, combining strong predictive capability with explainability that fosters trust in clinical applications.
副神经节瘤和嗜铬细胞瘤的主要问题之一是,这些肿瘤的转移率高达20%,且无法可靠预测。虽然机器学习模型在提高预测准确性方面前景广阔,但其往往不透明的性质限制了在医疗保健等关键领域的信任度和采用率。了解驱动预测的因素不仅对于验证其可靠性至关重要,而且对于将其整合到临床决策中也很关键。在本文中,我们提出了一种架构,该架构结合了数据挖掘、机器学习和可解释性技术,以改进对这类癌症转移疾病的预测,并增强对模型的信任。各种各样的算法已被应用于预测模型的开发,重点是解释其输出以支持临床见解。我们的方法包括一个全面的预处理阶段来准备数据,随后应用分类算法。集成了可解释性技术以深入了解驱动预测的关键因素。此外,还进行了特征选择过程,以识别最具影响力的变量,并探索它们的纳入如何影响模型性能。表现最佳的算法随机森林,在其他指标中,准确率达到96.3%,精确率达到96.5%,AUC为0.963,将强大的预测能力与可解释性相结合,增强了在临床应用中的信任度。