Núcleo de Sistemas Eletrônicos Embarcados, Instituto Mauá de Tecnologia, São Paulo, 09580-900, Brazil.
Information and Epidemiology, Fundação Oncocentro de São Paulo, São Paulo, 05409-012, Brazil.
Sci Rep. 2023 Jun 1;13(1):8874. doi: 10.1038/s41598-023-35649-9.
Colorectal cancer is one of the most incident types of cancer in the world, with almost 2 million new cases annually. In Brazil, the scenery is the same, around 41 thousand new cases were estimated in the last 3 years. This increase in cases further intensifies the interest and importance of studies related to the topic, especially using new approaches. The use of machine learning algorithms for cancer studies has grown in recent years, and they can provide important information to medicine, in addition to making predictions based on the data. In this study, five different classifications were performed, considering patients' survival. Data were extracted from Hospital Based Cancer Registries of São Paulo, which is coordinated by Fundação Oncocentro de São Paulo, containing patients with colorectal cancer from São Paulo state, Brazil, treated between 2000 and 2021. The machine learning models used provided us the predictions and the most important features for each one of the algorithms of the studies. Using part of the dataset to validate our models, the results of the predictors were around 77% of accuracy, with AUC close to 0.86, and the most important column was the clinical staging in all of them.
结直肠癌是世界上最常见的癌症类型之一,每年有近 200 万例新发病例。在巴西,情况也是如此,在过去 3 年中,估计有 4.1 万例新发病例。病例的增加进一步加剧了人们对相关研究的兴趣和重视,尤其是使用新方法。近年来,机器学习算法在癌症研究中的应用有所增加,它们可以为医学提供重要信息,除了根据数据进行预测。在这项研究中,考虑到患者的生存情况,进行了五种不同的分类。数据是从圣保罗的基于医院的癌症登记处提取的,该登记处由圣保罗肿瘤基金会协调,包含了来自巴西圣保罗州的结直肠癌患者,他们在 2000 年至 2021 年间接受了治疗。所使用的机器学习模型为我们提供了预测结果和研究中每个算法的最重要特征。使用数据集的一部分来验证我们的模型,预测器的结果准确率约为 77%,AUC 接近 0.86,所有模型中最重要的列都是临床分期。