Department of Epidemiology and Biostatistics, School of Public Health, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China.
Hangzhou Center for Disease Control and Prevention, Hangzhou, Zhejiang, China.
BMC Public Health. 2024 Apr 25;24(1):1160. doi: 10.1186/s12889-024-18636-1.
Hearing impairment (HI) has become a major public health issue in China. Currently, due to the limitations of primary health care, the gold standard for HI diagnosis (pure-tone hearing test) is not suitable for large-scale use in community settings. Therefore, the purpose of this study was to develop a cost-effective HI screening model for the general population using machine learning (ML) methods and data gathered from community-based scenarios, aiming to help improve the hearing-related health outcomes of community residents.
This study recruited 3371 community residents from 7 health centres in Zhejiang, China. Sixty-eight indicators derived from questionnaire surveys and routine haematological tests were delivered and used for modelling. Seven commonly used ML models (the naive Bayes (NB), K-nearest neighbours (KNN), support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), boosting, and least absolute shrinkage and selection operator (LASSO regression)) were adopted and compared to develop the final high-frequency hearing impairment (HFHI) screening model for community residents. The model was constructed with a nomogram to obtain the risk score of the probability of individuals suffering from HFHI. According to the risk score, the population was divided into three risk stratifications (low, medium and high) and the risk factor characteristics of each dimension under different risk stratifications were identified.
Among all the algorithms used, the LASSO-based model achieved the best performance on the validation set by attaining an area under the curve (AUC) of 0.868 (95% confidence interval (CI): 0.847-0.889) and reaching precision, specificity and F-score values all greater than 80%. Five demographic indicators, 7 disease-related features, 5 behavioural factors, 2 environmental exposures, 2 hearing cognitive factors, and 13 blood test indicators were identified in the final screening model. A total of 91.42% (1235/1129) of the subjects in the high-risk group were confirmed to have HI by audiometry, which was 3.99 times greater than that in the low-risk group (22.91%, 301/1314). The high-risk population was mainly characterized as older, low-income and low-educated males, especially those with multiple chronic conditions, noise exposure, poor lifestyle, abnormal blood indices (e.g., red cell distribution width (RDW) and platelet distribution width (PDW)) and liver function indicators (e.g., triglyceride (TG), indirect bilirubin (IBIL), aspartate aminotransferase (AST) and low-density lipoprotein (LDL)). An HFHI nomogram was further generated to improve the operability of the screening model for community applications.
The HFHI risk screening model developed based on ML algorithms can more accurately identify residents with HFHI by categorizing them into the high-risk groups, which can further help to identify modifiable and immutable risk factors for residents at high risk of HI and promote their personalized HI prevention or intervention.
听力障碍(HI)已成为中国的一个主要公共卫生问题。目前,由于基层医疗保健的局限性,HI 的金标准诊断方法(纯音听力测试)不适用于社区环境中的大规模使用。因此,本研究旨在使用机器学习(ML)方法和从社区情景中收集的数据,开发一种针对一般人群的具有成本效益的 HI 筛查模型,旨在帮助改善社区居民的听力健康状况。
本研究从中国浙江的 7 个卫生中心招募了 3371 名社区居民。从问卷调查和常规血液检测中得出 68 个指标,并用于建模。采用了 7 种常用的 ML 模型(朴素贝叶斯(NB)、K 最近邻(KNN)、支持向量机(SVM)、随机森林(RF)、极端梯度提升(XGBoost)、boosting 和最小绝对收缩和选择算子(LASSO 回归))来开发最终的社区居民高频听力障碍(HFHI)筛查模型。该模型使用列线图构建,以获得个人患 HFHI 的概率的风险评分。根据风险评分,将人群分为三个风险分层(低、中、高),并确定每个风险分层下各维度的风险因素特征。
在所使用的所有算法中,基于 LASSO 的模型在验证集上的表现最佳,获得了 0.868 的曲线下面积(AUC)(95%置信区间(CI):0.847-0.889),并达到了大于 80%的精度、特异性和 F 分数值。最终的筛查模型中确定了 5 个人口统计学指标、7 个疾病相关特征、5 个行为因素、2 个环境暴露、2 个听力认知因素和 13 个血液检测指标。通过听力测试,高风险组中 91.42%(1235/1129)的受试者被确认为 HI,是低风险组(22.91%,301/1314)的 3.99 倍。高危人群主要为年龄较大、收入较低和教育程度较低的男性,特别是患有多种慢性病、噪声暴露、不良生活方式、血液指标异常(如红细胞分布宽度(RDW)和血小板分布宽度(PDW))和肝功能指标(如甘油三酯(TG)、间接胆红素(IBIL)、天冬氨酸转氨酶(AST)和低密度脂蛋白(LDL))的人群。进一步生成了 HFHI 列线图,以提高社区应用中筛查模型的可操作性。
基于 ML 算法开发的 HFHI 风险筛查模型可以通过将居民分类为高危人群,更准确地识别出 HFHI 患者,从而进一步帮助识别 HI 高危居民的可改变和不可改变的风险因素,并促进其个性化的 HI 预防或干预。