Shao Xiaomei, Zhang Ling, Wang Yuting, Ying Youmei, Chen Xueqin
Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, China.
Huai'an No. 3 People's Hospital, Huaian Second Clinical College of Xuzhou Medical University, Jiangsu, China.
Front Public Health. 2025 Jul 10;13:1602566. doi: 10.3389/fpubh.2025.1602566. eCollection 2025.
Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.
We investigated the association between PFAS exposure and COPD risk in 4,450 National Health and Nutrition Examination Survey (NHANES) participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying K-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using partial dependence plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.
CatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) levels were associated with reduced COPD risk, whereas perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) showed positive associations with COPD. perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.
CatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.
慢性阻塞性肺疾病(COPD)是全球发病和死亡的主要原因之一,早期检测策略有限。虽然先前的研究已经探讨了全氟和多氟烷基物质(PFAS)与COPD之间的关系,但将可解释机器学习(ML)技术应用于这种关联的研究有限。
我们调查了2013年至2018年4450名国家健康和营养检查调查(NHANES)参与者中PFAS暴露与COPD风险之间的关联。在排除缺失协变量和极端PFAS值并应用K近邻(KNN)插补后,构建了包括CatBoost在内的九个ML模型,并使用准确性、曲线下面积(AUC)、敏感性和特异性等指标进行评估。使用部分依赖图(PDP)和SHapley加法解释(SHAP)分析对表现最佳的模型进行进一步分析。为了提高临床适用性,最终模型被部署为一个可公开访问的基于网络的风险计算器。
CatBoost成为最佳模型,准确率为84%,AUC为0.89,敏感性为81%,特异性为84%。PDP显示全氟辛烷磺酸(PFOS)和全氟十一烷酸(PFUA)水平较高与COPD风险降低相关,而全氟辛酸(PFOA)和2-(N-甲基-全氟辛烷磺酰胺)乙酸(MPAH)与COPD呈正相关。全氟壬酸(PFNA)、全氟癸酸(PFDE)和全氟己烷磺酸(PFHxS)表现出混合或非线性效应。SHAP分析为个体预测和总体变量贡献提供了见解,阐明了复杂的PFAS-COPD关系并支持潜在的公共卫生应用。
CatBoost将PFOS和PFUA确定为COPD的保护因素,而PFOA和MPAH增加了COPD风险。这些发现强调了更严格的PFAS监管的必要性,并突出了机器学习在指导预防策略方面的潜力。