Huna, São Paulo, Brazil.
Departamento de Ciências da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais/UFMG, Campus Belo Horizonte, Minas Gerais, Brazil.
Sci Rep. 2024 May 12;14(1):10841. doi: 10.1038/s41598-024-61215-y.
Optimizing early breast cancer (BC) detection requires effective risk assessment tools. This retrospective study from Brazil showcases the efficacy of machine learning in discerning complex patterns within routine blood tests, presenting a globally accessible and cost-effective approach for risk evaluation. We analyzed complete blood count (CBC) tests from 396,848 women aged 40-70, who underwent breast imaging or biopsies within six months after their CBC test. Of these, 2861 (0.72%) were identified as cases: 1882 with BC confirmed by anatomopathological tests, and 979 with highly suspicious imaging (BI-RADS 5). The remaining 393,987 participants (99.28%), with BI-RADS 1 or 2 results, were classified as controls. The database was divided into modeling (including training and validation) and testing sets based on diagnostic certainty. The testing set comprised cases confirmed by anatomopathology and controls cancer-free for 4.5-6.5 years post-CBC. Our ridge regression model, incorporating neutrophil-lymphocyte ratio, red blood cells, and age, achieved an AUC of 0.64 (95% CI 0.64-0.65). We also demonstrate that these results are slightly better than those from a boosting machine learning model, LightGBM, plus having the benefit of being fully interpretable. Using the probabilistic output from this model, we divided the study population into four risk groups: high, moderate, average, and low risk, which obtained relative ratios of BC of 1.99, 1.32, 1.02, and 0.42, respectively. The aim of this stratification was to streamline prioritization, potentially improving the early detection of breast cancer, particularly in resource-limited environments. As a risk stratification tool, this model offers the potential for personalized breast cancer screening by prioritizing women based on their individual risk, thereby indicating a shift from a broad population strategy.
优化早期乳腺癌(BC)检测需要有效的风险评估工具。这项来自巴西的回顾性研究展示了机器学习在辨别常规血液测试中的复杂模式方面的功效,提供了一种全球可及且具有成本效益的风险评估方法。我们分析了 396848 名 40-70 岁女性的完整血液计数(CBC)测试,这些女性在 CBC 测试后六个月内接受了乳房成像或活检。其中,2861 例(0.72%)被确定为病例:1882 例经解剖病理学检查证实为 BC,979 例影像学高度可疑(BI-RADS 5)。其余 393987 名参与者(99.28%),BI-RADS 1 或 2 结果,被归类为对照组。根据诊断确定性,数据库分为建模(包括训练和验证)和测试集。测试集由解剖病理学证实的病例和 CBC 后 4.5-6.5 年无癌症的对照组组成。我们的岭回归模型结合中性粒细胞-淋巴细胞比、红细胞和年龄,AUC 为 0.64(95%CI 0.64-0.65)。我们还证明,这些结果略优于 LightGBM 增强机器学习模型的结果,并且具有完全可解释的优点。使用该模型的概率输出,我们将研究人群分为四个风险组:高、中、平均和低风险,分别获得乳腺癌的相对比值为 1.99、1.32、1.02 和 0.42。这种分层的目的是简化优先级排序,有可能改善乳腺癌的早期检测,特别是在资源有限的环境中。作为一种风险分层工具,该模型通过根据女性的个体风险对其进行优先排序,为个性化乳腺癌筛查提供了可能性,从而表明从广泛的人群策略向个体策略转变。