• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

职位名称的职业编码:加拿大国家职业分类(ACA-NOC)自动编码算法的迭代开发

Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC).

作者信息

Bao Hongchang, Baker Christopher J O, Adisesh Anil

机构信息

Department of Computer Science, Faculty of Science, Applied Science and Engineering, University of New Brunswick, Saint John, NB, Canada.

Department of Computing Science, University of Alberta, Edmonton, AB, Canada.

出版信息

JMIR Form Res. 2020 Aug 5;4(8):e16422. doi: 10.2196/16422.

DOI:10.2196/16422
PMID:32755893
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7439137/
Abstract

BACKGROUND

In many research studies, the identification of social determinants is an important activity, in particular, information about occupations is frequently added to existing patient data. Such information is usually solicited during interviews with open-ended questions such as "What is your job?" and "What industry sector do you work in?" Before being able to use this information for further analysis, the responses need to be categorized using a coding system, such as the Canadian National Occupational Classification (NOC). Manual coding is the usual method, which is a time-consuming and error-prone activity, suitable for automation.

OBJECTIVE

This study aims to facilitate automated coding by introducing a rigorous algorithm that will be able to identify the NOC (2016) codes using only job title and industry information as input. Using manually coded data sets, we sought to benchmark and iteratively improve the performance of the algorithm.

METHODS

We developed the ACA-NOC algorithm based on the NOC (2016), which allowed users to match NOC codes with job and industry titles. We employed several different search strategies in the ACA-NOC algorithm to find the best match, including exact search, minor exact search, like search, near (same order) search, near (different order) search, any search, and weak match search. In addition, a filtering step based on the hierarchical structure of the NOC data was applied to the algorithm to select the best matching codes.

RESULTS

The ACA-NOC was applied to over 500 manually coded job and industry titles. The accuracy rate at the four-digit NOC code level was 58.7% (332/566) and improved when broader job categories were considered (65.0% at the three-digit NOC code level, 72.3% at the two-digit NOC code level, and 81.6% at the one-digit NOC code level).

CONCLUSIONS

The ACA-NOC is a rigorous algorithm for automatically coding the Canadian NOC system and has been evaluated using real-world data. It allows researchers to code moderate-sized data sets with occupation in a timely and cost-efficient manner such that further analytics are possible. Initial assessments indicate that it has state-of-the-art performance and is readily extensible upon further benchmarking on larger data sets.

摘要

背景

在许多研究中,识别社会决定因素是一项重要活动,特别是关于职业的信息经常被添加到现有的患者数据中。此类信息通常在访谈中通过开放式问题收集,例如“你的工作是什么?”以及“你从事哪个行业?”在能够将这些信息用于进一步分析之前,需要使用编码系统(如加拿大国家职业分类(NOC))对回答进行分类。手动编码是常用方法,这是一项耗时且容易出错的活动,适合自动化。

目的

本研究旨在通过引入一种严格的算法来促进自动编码,该算法仅使用职位名称和行业信息作为输入就能识别NOC(2016)代码。我们使用手动编码的数据集来对算法的性能进行基准测试并迭代改进。

方法

我们基于NOC(2016)开发了ACA-NOC算法,该算法允许用户将NOC代码与职位和行业名称进行匹配。我们在ACA-NOC算法中采用了几种不同的搜索策略来找到最佳匹配,包括精确搜索、轻微精确搜索、相似搜索、近邻(相同顺序)搜索、近邻(不同顺序)搜索、任意搜索和弱匹配搜索。此外,基于NOC数据的层次结构的过滤步骤被应用于该算法以选择最佳匹配代码。

结果

ACA-NOC应用于500多个手动编码的职位和行业名称。在四位数NOC代码级别,准确率为58.7%(332/566),当考虑更宽泛的职业类别时准确率有所提高(三位数NOC代码级别为65.0%,两位数NOC代码级别为72.3%,一位数NOC代码级别为81.6%)。

结论

ACA-NOC是一种用于自动编码加拿大NOC系统的严格算法,并且已使用实际数据进行了评估。它允许研究人员以及时且经济高效的方式对包含职业信息的中等规模数据集进行编码,从而能够进行进一步的分析。初步评估表明它具有先进的性能,并且在对更大数据集进行进一步基准测试时易于扩展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1720/7439137/17265e69589b/formative_v4i8e16422_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1720/7439137/17265e69589b/formative_v4i8e16422_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1720/7439137/17265e69589b/formative_v4i8e16422_fig1.jpg

相似文献

1
Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC).职位名称的职业编码:加拿大国家职业分类(ACA-NOC)自动编码算法的迭代开发
JMIR Form Res. 2020 Aug 5;4(8):e16422. doi: 10.2196/16422.
2
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.基于计算机的自由文本职位描述编码,以在流行病学研究中高效识别职业。
Occup Environ Med. 2016 Jun;73(6):417-24. doi: 10.1136/oemed-2015-103152. Epub 2016 Apr 21.
3
Occupational self-coding and automatic recording (OSCAR): a novel web-based tool to collect and code lifetime job histories in large population-based studies.职业自我编码与自动记录(OSCAR):一种用于在大型基于人群的研究中收集和编码终生工作经历的新型网络工具。
Scand J Work Environ Health. 2017 Mar 1;43(2):181-186. doi: 10.5271/sjweh.3613. Epub 2016 Dec 14.
4
Industry and Occupation in the Electronic Health Record: An Investigation of the National Institute for Occupational Safety and Health Industry and Occupation Computerized Coding System.电子健康记录中的行业和职业:对国家职业安全与健康研究所行业和职业计算机编码系统的调查。
JMIR Med Inform. 2016 Feb 15;4(1):e5. doi: 10.2196/medinform.4839.
5
Beware the Grizzlyman: A comparison of job- and industry-based noise exposure estimates using manual coding and the NIOSH NIOCCS machine learning algorithm.小心灰熊人:使用手动编码和 NIOSH NIOCCS 机器学习算法比较基于工作和行业的噪声暴露估计。
J Occup Environ Hyg. 2022 Jul;19(7):437-447. doi: 10.1080/15459624.2022.2076860. Epub 2022 Jun 7.
6
Efficiency of autocoding programs for converting job descriptors into standard occupational classification (SOC) codes.自动编码程序将工作描述转换为标准职业分类(SOC)代码的效率。
Am J Ind Med. 2019 Jan;62(1):59-68. doi: 10.1002/ajim.22928. Epub 2018 Dec 5.
7
Computer-Based Coding of Occupation Codes for Epidemiological Analyses.用于流行病学分析的职业代码的计算机编码
Proc IEEE Int Symp Comput Based Med Syst. 2014 May;2014:347-350. doi: 10.1109/CBMS.2014.79.
8
Performance of automated and manual coding systems for occupational data: a case study of historical records.自动化和手动编码系统在职业数据中的表现:基于历史记录的案例研究。
Am J Ind Med. 2012 Mar;55(3):228-31. doi: 10.1002/ajim.22005.
9
Occupational models from 42 million unstructured job postings.来自4200万个非结构化招聘信息的职业模型。
Patterns (N Y). 2023 May 22;4(7):100757. doi: 10.1016/j.patter.2023.100757. eCollection 2023 Jul 14.
10
Impact of Variability in Job Coding on Reliability in Exposure Estimates Obtained via a Job-Exposure Matrix.工作编码变异性对通过工作暴露矩阵获得的暴露估计可靠性的影响。
Ann Work Expo Health. 2022 Jun 6;66(5):551-562. doi: 10.1093/annweh/wxab106.

引用本文的文献

1
OPERAS decision support system versus manual job coding: a quantitative analysis on coding time and inter-coder reliability.OPERAS决策支持系统与手工工作编码:编码时间和编码员间信度的定量分析
Occup Environ Med. 2025 Jul 9;82(4):183-190. doi: 10.1136/oemed-2024-109823.
2
Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys.基于DistilKoBERT的职业分类模型:使用韩国第五次和第六次工作条件调查
Ann Occup Environ Med. 2024 Aug 6;36:e19. doi: 10.35371/aoem.2024.36.e19. eCollection 2024.
3
Artificial intelligence exceeds humans in epidemiological job coding.

本文引用的文献

1
Age-stratified burden of pneumococcal community acquired pneumonia in hospitalised Canadian adults from 2010 to 2015.2010 年至 2015 年加拿大住院成年患者中肺炎球菌性社区获得性肺炎的年龄分层负担。
BMJ Open Respir Res. 2020 Mar;7(1). doi: 10.1136/bmjresp-2019-000550.
2
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.基于计算机的自由文本职位描述编码,以在流行病学研究中高效识别职业。
Occup Environ Med. 2016 Jun;73(6):417-24. doi: 10.1136/oemed-2015-103152. Epub 2016 Apr 21.
3
Beyond crosswalks: reliability of exposure assessment following automated coding of free-text job descriptions for occupational epidemiology.
在流行病学工作编码方面,人工智能超越了人类。
Commun Med (Lond). 2023 Nov 4;3(1):160. doi: 10.1038/s43856-023-00397-4.
4
Evaluation of the updated SOCcer v2 algorithm for coding free-text job descriptions in three epidemiologic studies.评估更新后的 SOCcer v2 算法在三项流行病学研究中对自由文本工作描述进行编码的效果。
Ann Work Expo Health. 2023 Jul 6;67(6):772-783. doi: 10.1093/annweh/wxad020.
5
Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison.从一般人群研究中自动编码工作描述:现有工具概述、应用及比较。
Ann Work Expo Health. 2023 Jun 6;67(5):663-672. doi: 10.1093/annweh/wxad002.
超越人行横道:职业流行病学中文本自由格式工作描述自动编码后暴露评估的可靠性
Ann Occup Hyg. 2014 May;58(4):482-92. doi: 10.1093/annhyg/meu006. Epub 2014 Feb 6.
4
Occupation as socioeconomic status or environmental exposure? A survey of practice among population-based cardiovascular studies in the United States.职业是社会经济地位还是环境暴露因素?对美国基于人群的心血管研究实践的一项调查。
Am J Epidemiol. 2009 Jun 15;169(12):1411-21. doi: 10.1093/aje/kwp082. Epub 2009 May 8.
5
The use of occupation and industry classifications in general population studies.职业和行业分类在一般人群研究中的应用。
Int J Epidemiol. 2003 Jun;32(3):419-28. doi: 10.1093/ije/dyg080.