Woźniacki Andrzej, Książek Wojciech, Mrowczyk Patrycja
Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Cracow, Poland.
Oncology Clinical Department, The University Hospital in Cracow, Kopernika 50, 31-501 Cracow, Poland.
Cancers (Basel). 2024 Sep 20;16(18):3205. doi: 10.3390/cancers16183205.
Colorectal cancer is one of the most prevalent forms of cancer and is associated with a high mortality rate. Additionally, an increasing number of adults under 50 are being diagnosed with the disease. This underscores the importance of leveraging modern technologies, such as artificial intelligence, for early diagnosis and treatment support.
Eight classifiers were utilized in this research: Random Forest, XGBoost, CatBoost, LightGBM, Gradient Boosting, Extra Trees, the k-nearest neighbor algorithm (KNN), and decision trees. These algorithms were optimized using the frameworks Optuna, RayTune, and HyperOpt. This study was conducted on a public dataset from Brazil, containing information on tens of thousands of patients.
The models developed in this study demonstrated high classification accuracy in predicting one-, three-, and five-year survival, as well as overall mortality and cancer-specific mortality. The CatBoost, LightGBM, Gradient Boosting, and Random Forest classifiers delivered the best performance, achieving an accuracy of approximately 80% across all the evaluated tasks.
This research enabled the development of effective classification models that can be applied in clinical practice.
结直肠癌是最常见的癌症形式之一,死亡率很高。此外,越来越多50岁以下的成年人被诊断出患有这种疾病。这凸显了利用人工智能等现代技术进行早期诊断和治疗支持的重要性。
本研究使用了八种分类器:随机森林、XGBoost、CatBoost、LightGBM、梯度提升、极端随机树、k近邻算法(KNN)和决策树。这些算法使用Optuna、RayTune和HyperOpt框架进行了优化。本研究基于巴西的一个公共数据集进行,该数据集包含数万名患者的信息。
本研究开发的模型在预测一年、三年和五年生存率以及总死亡率和癌症特异性死亡率方面表现出较高的分类准确率。CatBoost、LightGBM、梯度提升和随机森林分类器表现最佳,在所有评估任务中准确率约为80%。
本研究促成了可应用于临床实践的有效分类模型的开发。