Mi Yi, Sun Pin
Department of Occupational Health & Toxicology, School of Public Health, Fudan University, Shanghai 200032, PR China.
Department of Occupational Health & Toxicology, School of Public Health, Fudan University, Shanghai 200032, PR China.
Hear Res. 2025 Jun;461:109252. doi: 10.1016/j.heares.2025.109252. Epub 2025 Mar 30.
The prevalence of hearing loss (HL) has emerged as an escalating public health concern globally. The objective of this study was to leverage data from the National Health and Nutritional Examination Survey (NHANES) to develop an interpretable predictive machine learning (ML) model for HL. In accordance with the established inclusion and exclusion criteria, a total of 2814 participants were randomly assigned to one of two distinct groups for the training and validation of the predictive models. We identified the most significant variables using Recursive Feature Elimination and constructed a HL prediction model through various ML models. The generalization ability of the models was evaluated via 10-fold cross-validation. Eight different models were utilized to develop the optimal prediction model for HL. Subsequently, three interpretable methods, Feature importance analysis, Generalized linear model (GLM) and Restricted cubic spline (RCS) were integrated into a pipeline and embedded in ML for model interpretation. In this study, the Random Forest (RF) exhibited superior performance across all evaluation metrics after balancing the data using the Synthetic Minority Oversampling Technique (SMOTE), particularly excelling in AUC, PR-AUC and F1 score. Feature importance analysis uncovered significant correlations between HL and top 10 features, including age, blood lead (Pb) level, urine thallium (Tl) level, BMI, total energy, urine antimon (Sb) level, vitamin E intake, urine cobalt (Co) level, calcium intake and urine cesium (Cs) level. Moreover, both univariate and multivariate GLMs identified blood Pb [OR (95 % CI):1.169 (1.037,1.311)] and vitamin E intake [OR (95 % CI):0.776 (0.641,0.928)] as the main features associated with HL. The RCS analysis further revealed that increased blood Pb level and decreased vitamin E intake correspond to a proportional rise in the anticipated risk of HL after adjusted by confounders. Our ML models identify key factors that, if validated by future studies, will have important implications for hearing conservation. Furthermore, these ML-based point-of-care prediction models will help overcome barriers to hearing healthcare and enable the efficient allocation of resources by accurately identifying individuals who are in dire need of hearing assessment.
听力损失(HL)的患病率已成为全球范围内日益严重的公共卫生问题。本研究的目的是利用国家健康与营养检查调查(NHANES)的数据,开发一种可解释的用于HL的预测性机器学习(ML)模型。根据既定的纳入和排除标准,总共2814名参与者被随机分配到两个不同的组之一,用于预测模型的训练和验证。我们使用递归特征消除法确定了最显著的变量,并通过各种ML模型构建了HL预测模型。通过10折交叉验证评估模型的泛化能力。使用八种不同的模型来开发用于HL的最佳预测模型。随后,将三种可解释方法,即特征重要性分析、广义线性模型(GLM)和受限立方样条(RCS)集成到一个流程中,并嵌入到ML中进行模型解释。在本研究中,使用合成少数过采样技术(SMOTE)平衡数据后,随机森林(RF)在所有评估指标上均表现出卓越的性能,尤其在AUC、PR-AUC和F1分数方面表现出色。特征重要性分析揭示了HL与前10个特征之间的显著相关性,包括年龄、血铅(Pb)水平、尿铊(Tl)水平、BMI、总能量、尿锑(Sb)水平、维生素E摄入量、尿钴(Co)水平、钙摄入量和尿铯(Cs)水平。此外,单变量和多变量GLM均将血铅[OR(95%CI):1.169(1.037,1.311)]和维生素E摄入量[OR(95%CI):0.776(0.641,0.928)]确定为与HL相关的主要特征。RCS分析进一步表明,在经混杂因素调整后,血铅水平升高和维生素E摄入量降低与HL预期风险的相应增加成正比。我们的ML模型确定了关键因素,若经未来研究验证,将对听力保护具有重要意义。此外,这些基于ML的即时护理预测模型将有助于克服听力保健的障碍,并通过准确识别急需听力评估的个体来实现资源的有效分配。