• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于DistilKoBERT的职业分类模型:使用韩国第五次和第六次工作条件调查

Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys.

作者信息

Kim Tae-Yeon, Baek Seong-Uk, Lim Myeong-Hun, Yun Byungyoon, Paek Domyung, Zoh Kyung Ehi, Youn Kanwoo, Lee Yun Keun, Kim Yangho, Kim Jungwon, Choi Eunsuk, Kang Mo-Yeol, Cho YoonHo, Lee Kyung-Eun, Sim Juho, Oh Juyeon, Park Heejoo, Lee Jian, Won Jong-Uk, Lee Yu-Min, Yoon Jin-Ha

机构信息

Department of Occupational and Environmental Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.

The Institute for Occupational Health, Yonsei University College of Medicine, Seoul, Korea.

出版信息

Ann Occup Environ Med. 2024 Aug 6;36:e19. doi: 10.35371/aoem.2024.36.e19. eCollection 2024.

DOI:10.35371/aoem.2024.36.e19
PMID:39188666
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11345209/
Abstract

BACKGROUND

Accurate occupation classification is essential in various fields, including policy development and epidemiological studies. This study aims to develop an occupation classification model based on DistilKoBERT.

METHODS

This study used data from the 5th and 6th Korean Working Conditions Surveys conducted in 2017 and 2020, respectively. A total of 99,665 survey participants, who were nationally representative of Korean workers, were included. We used natural language responses regarding their job responsibilities and occupational codes based on the Korean Standard Classification of Occupations (7th version, 3-digit codes). The dataset was randomly split into training and test datasets in a ratio of 7:3. The occupation classification model based on DistilKoBERT was fine-tuned using the training dataset, and the model was evaluated using the test dataset. The accuracy, precision, recall, and F1 score were calculated as evaluation metrics.

RESULTS

The final model, which classified 28,996 survey participants in the test dataset into 142 occupational codes, exhibited an accuracy of 84.44%. For the evaluation metrics, the precision, recall, and F1 score of the model, calculated by weighting based on the sample size, were 0.83, 0.84, and 0.83, respectively. The model demonstrated high precision in the classification of service and sales workers yet exhibited low precision in the classification of managers. In addition, it displayed high precision in classifying occupations prominently represented in the training dataset.

CONCLUSIONS

This study developed an occupation classification system based on DistilKoBERT, which demonstrated reasonable performance. Despite further efforts to enhance the classification accuracy, this automated occupation classification model holds promise for advancing epidemiological studies in the fields of occupational safety and health.

摘要

背景

准确的职业分类在包括政策制定和流行病学研究在内的各个领域都至关重要。本研究旨在开发一种基于DistilKoBERT的职业分类模型。

方法

本研究使用了分别于2017年和2020年进行的第五次和第六次韩国工作条件调查的数据。共有99,665名调查参与者被纳入,他们在韩国工人中具有全国代表性。我们使用了基于《韩国职业标准分类》(第7版,3位代码)的关于工作职责和职业代码的自然语言回答。数据集以7:3的比例随机分为训练集和测试集。基于DistilKoBERT的职业分类模型使用训练集进行微调,并使用测试集对模型进行评估。计算准确率、精确率、召回率和F1分数作为评估指标。

结果

最终模型将测试集中的28,996名调查参与者分类为142个职业代码,准确率为84.44%。对于评估指标,基于样本量加权计算的模型精确率、召回率和F1分数分别为0.83、0.84和0.83。该模型在服务和销售人员的分类中显示出高精度,但在管理人员的分类中显示出低精度。此外,它在对训练数据集中显著代表的职业进行分类时显示出高精度。

结论

本研究开发了一种基于DistilKoBERT的职业分类系统,该系统表现出合理的性能。尽管为提高分类准确率还需进一步努力,但这种自动化职业分类模型有望推动职业安全与健康领域的流行病学研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3265/11345209/58fb4bf7665b/aoem-36-e19-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3265/11345209/2249d341d7f5/aoem-36-e19-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3265/11345209/58fb4bf7665b/aoem-36-e19-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3265/11345209/2249d341d7f5/aoem-36-e19-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3265/11345209/58fb4bf7665b/aoem-36-e19-g002.jpg

相似文献

1
Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys.基于DistilKoBERT的职业分类模型:使用韩国第五次和第六次工作条件调查
Ann Occup Environ Med. 2024 Aug 6;36:e19. doi: 10.35371/aoem.2024.36.e19. eCollection 2024.
2
The Modified International Standard Classification of Occupations defined by the clustering of occupational characteristics in the Korean Working Conditions Survey.通过韩国工作条件调查中的职业特征聚类定义的修订国际职业标准分类。
Ind Health. 2020 Apr 2;58(2):132-141. doi: 10.2486/indhealth.2018-0169. Epub 2019 Sep 13.
3
Associations of sitting time and occupation with metabolic syndrome in South Korean adults: a cross-sectional study.韩国成年人久坐时间和职业与代谢综合征的关联:一项横断面研究。
BMC Public Health. 2016 Sep 7;16(1):943. doi: 10.1186/s12889-016-3617-5.
4
Automatic Classification of Thyroid Findings Using Static and Contextualized Ensemble Natural Language Processing Systems: Development Study.使用静态和情境化集成自然语言处理系统对甲状腺检查结果进行自动分类:开发研究
JMIR Med Inform. 2021 Sep 21;9(9):e30223. doi: 10.2196/30223.
5
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.基于计算机的自由文本职位描述编码,以在流行病学研究中高效识别职业。
Occup Environ Med. 2016 Jun;73(6):417-24. doi: 10.1136/oemed-2015-103152. Epub 2016 Apr 21.
6
Occupation and mental health in a national UK survey.职业与英国全国性调查中的精神健康。
Soc Psychiatry Psychiatr Epidemiol. 2011 Feb;46(2):101-10. doi: 10.1007/s00127-009-0173-7. Epub 2009 Dec 24.
7
Automated deep learning for classification of dental implant radiographs using a large multi-center dataset.使用大型多中心数据集进行牙科种植体 X 光片分类的自动化深度学习。
Sci Rep. 2023 Mar 24;13(1):4862. doi: 10.1038/s41598-023-32118-1.
8
Occupational risks for voice disorders: Evidence from a Korea national cross-sectional survey.嗓音障碍的职业风险:来自韩国全国横断面调查的证据。
Logoped Phoniatr Vocol. 2017 Apr;42(1):39-43. doi: 10.1080/14015439.2016.1178326. Epub 2016 May 6.
9
The impact of occupation according to income on depressive symptoms in South Korean individuals: Findings from the Korean Welfare Panel Study.职业对收入影响与韩国个体抑郁症状的关系:来自韩国福利面板研究的发现。
Int J Soc Psychiatry. 2016 May;62(3):227-34. doi: 10.1177/0020764015623973. Epub 2016 Jan 22.
10
Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC).职位名称的职业编码:加拿大国家职业分类(ACA-NOC)自动编码算法的迭代开发
JMIR Form Res. 2020 Aug 5;4(8):e16422. doi: 10.2196/16422.

本文引用的文献

1
Job-Exposure Matrix: A Useful Tool for Incorporating Workplace Exposure Data Into Population Health Research and Practice.工作暴露矩阵:将工作场所暴露数据纳入人群健康研究与实践的有用工具。
Front Epidemiol. 2022 Apr 26;2:857316. doi: 10.3389/fepid.2022.857316. eCollection 2022.
2
Data resource profile: the Korean Working Conditions Survey (KWCS).数据资源简介:韩国工作条件调查(KWCS)
Ann Occup Environ Med. 2023 Nov 23;35:e49. doi: 10.35371/aoem.2023.35.e49. eCollection 2023.
3
Standard Occupational Classification Codes: Gaps in Federal Data on the Public Health Workforce.
标准职业分类代码:公共卫生劳动力联邦数据中的差距。
Am J Public Health. 2024 Jan;114(1):48-56. doi: 10.2105/AJPH.2023.307463.
4
Artificial intelligence exceeds humans in epidemiological job coding.在流行病学工作编码方面,人工智能超越了人类。
Commun Med (Lond). 2023 Nov 4;3(1):160. doi: 10.1038/s43856-023-00397-4.
5
Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison.从一般人群研究中自动编码工作描述:现有工具概述、应用及比较。
Ann Work Expo Health. 2023 Jun 6;67(5):663-672. doi: 10.1093/annweh/wxad002.
6
A Pilot Establishment of the Job-Exposure Matrix of Lead Using the Standard Process Code of Nationwide Exposure Databases in Korea.利用韩国全国暴露数据库的标准流程代码初步建立铅的职业暴露矩阵
Saf Health Work. 2022 Dec;13(4):493-499. doi: 10.1016/j.shaw.2022.09.001. Epub 2022 Sep 9.
7
Predicting the sentiment of South Korean Twitter users toward vaccination after the emergence of COVID-19 Omicron variant using deep learning-based natural language processing.使用基于深度学习的自然语言处理预测新冠病毒奥密克戎变种出现后韩国推特用户对疫苗接种的情绪。
Front Med (Lausanne). 2022 Sep 14;9:948917. doi: 10.3389/fmed.2022.948917. eCollection 2022.
8
Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)-Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study.传统机器学习模型与基于双向编码器表征变换器(BERT)的饮食失调推文自动分类:算法开发与验证研究
JMIR Med Inform. 2022 Feb 24;10(2):e34492. doi: 10.2196/34492.
9
Determining occupation for National Violent Death Reporting System records: An evaluation of autocoding programs.确定国家暴力死亡报告系统记录的职业:自动编码程序的评估。
Am J Ind Med. 2021 Dec;64(12):1018-1027. doi: 10.1002/ajim.23292. Epub 2021 Sep 7.
10
Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC).职位名称的职业编码:加拿大国家职业分类(ACA-NOC)自动编码算法的迭代开发
JMIR Form Res. 2020 Aug 5;4(8):e16422. doi: 10.2196/16422.