Su Na, Tang Rui, Zhang Yice, Ni Jiaqi, Huang Yimei, Liu Chunqi, Xiao Yuzhou, Zhu Baoting, Zhao Yinglan
West China School of Pharmacy, Sichuan University, Chengdu, China.
Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, China.
Front Pharmacol. 2024 Nov 21;15:1510220. doi: 10.3389/fphar.2024.1510220. eCollection 2024.
This study aimed to identify the risk factors for pancreatic cancer through machine learning.
We investigated the relationships between different risk factors and pancreatic cancer using a real-world retrospective cohort study conducted at West China Hospital of Sichuan University. Multivariable logistic regression, with pancreatic cancer as the outcome, was used to identify covariates associated with pancreatic cancer. The machine learning model extreme gradient boosting (XGBoost) was adopted as the final model for its high performance. Shapley additive explanations (SHAPs) were utilized to visualize the relationships between these potential risk factors and pancreatic cancer.
The cohort included 1,982 patients. The median ages for pancreatic cancer and nonpancreatic cancer groups were 58.1 years (IQR: 51.3-64.4) and 57.5 years (IQR: 49.5-64.9), respectively. Multivariable logistic regression indicated that kirsten rats arcomaviral oncogene homolog (KRAS) gene mutation, hyperlipidaemia, pancreatitis, and pancreatic cysts are significantly correlated with an increased risk of pancreatic cancer. The five most highly ranked features in the XGBoost model were KRAS gene mutation status, age, alcohol consumption status, pancreatitis status, and hyperlipidaemia status.
Machine learning algorithms confirmed that KRAS gene mutation, hyperlipidaemia, and pancreatitis are potential risk factors for pancreatic cancer. Additionally, the coexistence of KRAS gene mutation and pancreatitis, as well as KRAS gene mutation and pancreatic cysts, is associated with an increased risk of pancreatic cancer. Our findings offered valuable implications for public health strategies targeting the prevention and early detection of pancreatic cancer.
本研究旨在通过机器学习确定胰腺癌的危险因素。
我们利用四川大学华西医院开展的一项真实世界回顾性队列研究,调查了不同危险因素与胰腺癌之间的关系。以胰腺癌为结局变量,采用多变量逻辑回归来确定与胰腺癌相关的协变量。由于其高性能,采用机器学习模型极端梯度提升(XGBoost)作为最终模型。利用夏普利加性解释(SHAP)来可视化这些潜在危险因素与胰腺癌之间的关系。
该队列包括1982例患者。胰腺癌组和非胰腺癌组的中位年龄分别为58.1岁(四分位间距:51.3 - 64.4)和57.5岁(四分位间距:49.5 - 64.9)。多变量逻辑回归表明, Kirsten大鼠肉瘤病毒癌基因同源物(KRAS)基因突变、高脂血症、胰腺炎和胰腺囊肿与胰腺癌风险增加显著相关。XGBoost模型中排名最高的五个特征是KRAS基因突变状态、年龄、饮酒状况、胰腺炎状况和高脂血症状况。
机器学习算法证实KRAS基因突变、高脂血症和胰腺炎是胰腺癌的潜在危险因素。此外,KRAS基因突变与胰腺炎以及KRAS基因突变与胰腺囊肿的共存与胰腺癌风险增加相关。我们的研究结果为针对胰腺癌预防和早期检测的公共卫生策略提供了有价值的启示。