Kim Jinheum, Youn Kanwoo, Park Jinwoo
Department of Applied Statistics, University of Suwon, Hwaseong 18323, Republic of Korea.
Department of Occupational & Environmental Medicine, Wonjin Green Hospital, Seoul 02221, Republic of Korea.
Healthcare (Basel). 2024 Oct 11;12(20):2026. doi: 10.3390/healthcare12202026.
BACKGROUND/OBJECTIVES: This study investigated factors influencing the prevalence of musculoskeletal disorders (MSDs) resulting from agricultural work, utilizing the 2020 and 2022 occupational disease survey data collected by the Rural Development Administration. The combined data from these years indicated a 6.02% prevalence of MSDs, reflecting a significant class imbalance in the binary response variables. This imbalance could lead to classifiers overlooking rare events, potentially inflating accuracy assessments. METHODS: We evaluated five distinct models to compare their performance using both original and synthetic data and assessing the models' performance based on synthetic data generation. In the multivariate logistic model, we focused on the main effects of the covariates as there were no statistically significant second-order interactions. RESULTS: Focusing on the random over-sampling examples (ROSE) method, gender, age, and pesticide use were particularly impactful. The odds of experiencing MSDs were 1.29 times higher for females than males. The odds increased with age: 2.66 times higher for those aged 50-59, 4.60 times higher for those aged 60-69, and 7.16 times higher for those aged 70 or older, compared to those under 50. Pesticide use was associated with 1.26 times higher odds of developing MSDs. Among body part usage variables, all except wrists and knees were significant. Farmers who frequently used their necks, arms, and waist showed 1.27, 1.11, and 1.23 times higher odds of developing MSDs, respectively. CONCLUSIONS: The accuracy of the raw method was high, but the ROSE method outperformed it for precision and F1 score, and both methods showed similar AUC.
背景/目的:本研究利用农村发展管理局收集的2020年和2022年职业病调查数据,调查了影响农业工作导致的肌肉骨骼疾病(MSD)患病率的因素。这些年份的综合数据显示,MSD的患病率为6.02%,这反映了二元响应变量中存在显著的类别不平衡。这种不平衡可能导致分类器忽略罕见事件,从而可能夸大准确性评估。 方法:我们评估了五个不同的模型,使用原始数据和合成数据比较它们的性能,并根据合成数据生成评估模型的性能。在多元逻辑模型中,由于没有统计学上显著的二阶相互作用,我们关注协变量的主要影响。 结果:聚焦于随机过采样示例(ROSE)方法,性别、年龄和农药使用的影响尤为显著。女性患MSD的几率是男性的1.29倍。几率随年龄增加而上升:与50岁以下的人相比,50-59岁的人患MSD的几率高2.66倍,60-69岁的人高4.60倍,70岁及以上的人高7.16倍。使用农药与患MSD的几率高1.26倍相关。在身体部位使用变量中,除手腕和膝盖外,其他均有显著影响。经常使用颈部、手臂和腰部的农民患MSD的几率分别高1.27倍、1.11倍和1.23倍。 结论:原始方法的准确性较高,但ROSE方法在精度和F1分数方面表现更优,且两种方法的AUC相似。
Healthcare (Basel). 2024-10-11
J Agromedicine. 2016
Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi. 2012-3
Arch Environ Occup Health. 2018-1-2
Chiropr Man Therap. 2020-10-23
BMC Public Health. 2025-8-15
J Clin Med. 2024-7-6
Ann Occup Environ Med. 2021-5-14
Ann Occup Environ Med. 2019-3-8
IEEE Trans Pattern Anal Mach Intell. 2020-2
Med Pr. 2016