Suppr超能文献

机器学习算法预测乳腺癌发病风险:一项基于生化生物标志物的数据驱动回顾性研究

Machine learning algorithms predict breast cancer incidence risk: a data-driven retrospective study based on biochemical biomarkers.

作者信息

Guo Qianqian, Wu Peng, He Junhao, Zhang Ge, Zhou Wu, Chen Qianjun

机构信息

State Key Laboratory of Traditional Chinese Medicine Syndrome/Breast Department, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangdong Provincial Academy of Chinese Medical Sciences, Guangzhou, Guangdong, China.

Chinese Medicine Guangdong Laboratory, Guangzhou, Guangdong, China.

出版信息

BMC Cancer. 2025 Jul 1;25(1):1061. doi: 10.1186/s12885-025-14444-x.

Abstract

BACKGROUND

Current breast cancer prediction models typically rely on personal information and medical history, with limited inclusion of blood-based biomarkers. This study aimed to identify novel breast cancer risk factors using machine learning algorithms. By integrating both personal clinical factors and peripheral blood biochemical biomarkers, it sought to enhance the understanding of breast cancer risk.

METHODS

Data were screened and normalized according to predefined inclusion and exclusion criteria. Logistic regression with forward selection and six other machine learning algorithms were employed to identify variables associated with breast cancer incidence. The performance of the models was evaluated using the area under the curve (AUC) through 5-fold cross-validation.

RESULTS

The data were divided into a training cohort of 17,360 cases and a testing cohort of 8,551 cases. Logistic regression analysis revealed that breast cancer incidence was increased with age (odds ratio [OR]:1.136, 95% confidence interval [CI]: [1.130, 1.142], P < 0.001), gamma-glutamyl transferase (GGT) (OR: 1.002, 95% CI: [1.000, 1.004], P = 0.014), and alanine transaminase (ALT) (OR: 1.005, 95% CI: [1.001, 1.008], P = 0.008). Furthermore, the six machine learning algorithms consistently identified GGT and ALT as the most significant predictive features. The AUC values obtained from the six models after 5-fold cross-validation ranged from 0.779 to 0.862, with accuracy ranging from 0.780 to 0.841.

CONCLUSIONS

Our study identified two biochemical biomarkers (GGT and ALT) as promising indicators for breast cancer prediction. Incorporating these findings into a tailored breast cancer risk prediction model is needed in our future research.

摘要

背景

当前的乳腺癌预测模型通常依赖个人信息和病史,纳入的血液生物标志物有限。本研究旨在使用机器学习算法识别新的乳腺癌风险因素。通过整合个人临床因素和外周血生化生物标志物,旨在增强对乳腺癌风险的理解。

方法

根据预定义的纳入和排除标准对数据进行筛选和标准化。采用逐步向前选择的逻辑回归和其他六种机器学习算法来识别与乳腺癌发病率相关的变量。通过五折交叉验证,使用曲线下面积(AUC)评估模型的性能。

结果

数据分为17360例的训练队列和8551例的测试队列。逻辑回归分析显示,乳腺癌发病率随年龄增加而升高(比值比[OR]:1.136,95%置信区间[CI]:[1.130, 1.142],P < 0.001)、γ-谷氨酰转移酶(GGT)(OR:1.002,95% CI:[1.000, 1.004],P = 0.014)和丙氨酸转氨酶(ALT)(OR:1.005,95% CI:[1.001, 1.008],P = 0.008)升高。此外,六种机器学习算法一致将GGT和ALT识别为最显著的预测特征。五折交叉验证后,六个模型获得的AUC值范围为0.779至0.862,准确率范围为0.780至0.841。

结论

我们的研究确定了两种生化生物标志物(GGT和ALT)作为乳腺癌预测的有前景指标。在我们未来的研究中,需要将这些发现纳入定制的乳腺癌风险预测模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验