Department of Neurobiology, Care Sciences and Society, Division of Family Medicine and Primary Care, Karolinska Institutet, Solna, Sweden; Academic Primary Health Care Centre, Region Stockholm, Stockholm, Sweden.
Institute of Medicine, Department of Community Medicine and Public Health, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
Eur J Cancer. 2023 Mar;182:100-106. doi: 10.1016/j.ejca.2023.01.011. Epub 2023 Jan 20.
Primary health care (PHC) is often the first point of contact when diagnosing colorectal cancer (CRC). Human limitations in processing large amounts of information warrant the use of machine learning as a diagnostic prediction tool for CRC.
To develop a predictive model for identifying non-metastatic CRC (NMCRC) among PHC patients using diagnostic data analysed with machine learning.
A case-control study containing data on PHC visits for 542 patients >18 years old diagnosed with NMCRC in the Västra Götaland Region, Sweden, during 2011, and 2,139 matched controls.
Stochastic gradient boosting (SGB) was used to construct a model for predicting the presence of NMCRC based on diagnostic codes from PHC consultations during the year before the date of cancer diagnosis and the total number of consultations. Variables with a normalised relative influence (NRI) >1% were considered having an important contribution to the model. Risks of having NMCRC were calculated using odds ratios of marginal effects.
Of the 361 variables used as predictors in the stochastic gradient boosting model, 184 had non-zero influence, with 16 variables having NRI >1% and a combined NRI of 63.3%. Variables representing anaemia and bleeding had a combined NRI of 27.6%. The model had a sensitivity of 73.3% and a specificity of 83.5%. Change in bowel habit had the highest odds ratios of marginal effects at 28.8.
Machine learning is useful for identifying variables of importance for predicting NMCRC in PHC. Malignant diagnoses may be hidden behind benign symptoms such as haemorrhoids.
初级保健(PHC)通常是诊断结直肠癌(CRC)的第一站。人类在处理大量信息方面存在局限性,这使得机器学习成为 CRC 的诊断预测工具。
使用机器学习分析的诊断数据,为 PHC 患者开发一种用于识别非转移性 CRC(NMCRC)的预测模型。
一项病例对照研究,包含了 2011 年在瑞典哥德堡地区的 PHC 就诊的 542 名年龄大于 18 岁的 NMCRC 患者和 2139 名匹配对照的数据。
使用随机梯度增强(SGB)构建了一个基于癌症诊断前一年 PHC 就诊时的诊断代码和就诊总数预测 NMCRC 存在的模型。标准化相对影响(NRI)>1%的变量被认为对模型有重要贡献。使用边际效应的优势比计算患有 NMCRC 的风险。
在随机梯度增强模型中,有 361 个预测变量,其中 184 个变量有非零影响,有 16 个变量的 NRI>1%,总 NRI 为 63.3%。代表贫血和出血的变量的 NRI 总和为 27.6%。该模型的灵敏度为 73.3%,特异性为 83.5%。排便习惯的改变具有最高的边际效应优势比,为 28.8。
机器学习对于识别用于预测 PHC 中 NMCRC 的重要变量是有用的。恶性诊断可能隐藏在痔疮等良性症状背后。