• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估更新后的 SOCcer v2 算法在三项流行病学研究中对自由文本工作描述进行编码的效果。

Evaluation of the updated SOCcer v2 algorithm for coding free-text job descriptions in three epidemiologic studies.

机构信息

Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, United States.

Data Science and Engineering Research Group, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, United States.

出版信息

Ann Work Expo Health. 2023 Jul 6;67(6):772-783. doi: 10.1093/annweh/wxad020.

DOI:10.1093/annweh/wxad020
PMID:37071789
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10324641/
Abstract

OBJECTIVES

Computer-assisted coding of job descriptions to standardized occupational classification codes facilitates evaluating occupational risk factors in epidemiologic studies by reducing the number of jobs needing expert coding. We evaluated the performance of the 2nd version of SOCcer, a computerized algorithm designed to code free-text job descriptions to US SOC-2010 system based on free-text job titles and work tasks, to evaluate its accuracy.

METHODS

SOCcer v2 was updated by expanding the training data to include jobs from several epidemiologic studies and revising the algorithm to account for nonlinearity and incorporate interactions. We evaluated the agreement between codes assigned by experts and the highest scoring code (a measure of confidence in the algorithm-predicted assignment) from SOCcer v1 and v2 in 14,714 jobs from three epidemiology studies. We also linked exposure estimates for 258 agents in the job-exposure matrix CANJEM to the expert and SOCcer v2-assigned codes and compared those estimates using kappa and intraclass correlation coefficients. Analyses were stratified by SOCcer score, score distance between the top two scoring codes from SOCcer, and features from CANJEM.

RESULTS

SOCcer's v2 agreement at the 6-digit level was 50%, compared to 44% in v1, and was similar for the three studies (38%-45%). Overall agreement for v2 at the 2-, 3-, and 5-digit was 73%, 63%, and 56%, respectively. For v2, median ICCs for the probability and intensity metrics were 0.67 (IQR 0.59-0.74) and 0.56 (IQR 0.50-0.60), respectively. The agreement between the expert and SOCcer assigned codes linearly increased with SOCcer score. The agreement also improved when the top two scoring codes had larger differences in score.

CONCLUSIONS

Overall agreement with SOCcer v2 applied to job descriptions from North American epidemiologic studies was similar to the agreement usually observed between two experts. SOCcer's score predicted agreement with experts and can be used to prioritize jobs for expert review.

摘要

目的

通过减少需要专家编码的工作数量,将工作描述计算机辅助编码为标准化职业分类代码,便于在流行病学研究中评估职业危险因素。我们评估了 2 版 SOCcer 的性能,该算法是一种基于工作标题和工作任务将自由文本工作描述编码为美国 SOC-2010 系统的计算机算法,以评估其准确性。

方法

通过扩大训练数据,包括来自几项流行病学研究的工作,以及修改算法以考虑非线性和纳入交互作用,更新了 SOCcer v2。我们评估了专家分配的代码与 SOCcer v1 和 v2 的最高分代码(算法预测分配的置信度度量)之间的一致性,该一致性来自三个流行病学研究的 14714 个工作。我们还将职业暴露矩阵 CANJEM 中 258 种暴露剂的暴露估计值与专家和 SOCcer v2 分配的代码联系起来,并使用kappa 和组内相关系数比较了这些估计值。分析按 SOCcer 得分、SOCcer 得分最高的两个代码之间的得分距离以及 CANJEM 的特征进行分层。

结果

SOCcer v2 在 6 位数水平上的一致性为 50%,而 v1 为 44%,三个研究(38%-45%)相似。v2 在 2 位数、3 位数和 5 位数的总体一致性分别为 73%、63%和 56%。对于 v2,概率和强度指标的中位数 ICC 分别为 0.67(IQR 0.59-0.74)和 0.56(IQR 0.50-0.60)。专家与 SOCcer 分配代码之间的一致性随着 SOCcer 得分线性增加。当得分最高的两个代码之间的差异较大时,一致性也会提高。

结论

应用于来自北美的流行病学研究的工作描述的 SOCcer v2 与通常在两位专家之间观察到的一致性相似。SOCcer 的分数可预测与专家的一致性,可用于为专家审查分配工作。

相似文献

1
Evaluation of the updated SOCcer v2 algorithm for coding free-text job descriptions in three epidemiologic studies.评估更新后的 SOCcer v2 算法在三项流行病学研究中对自由文本工作描述进行编码的效果。
Ann Work Expo Health. 2023 Jul 6;67(6):772-783. doi: 10.1093/annweh/wxad020.
2
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.基于计算机的自由文本职位描述编码,以在流行病学研究中高效识别职业。
Occup Environ Med. 2016 Jun;73(6):417-24. doi: 10.1136/oemed-2015-103152. Epub 2016 Apr 21.
3
Impact of Variability in Job Coding on Reliability in Exposure Estimates Obtained via a Job-Exposure Matrix.工作编码变异性对通过工作暴露矩阵获得的暴露估计可靠性的影响。
Ann Work Expo Health. 2022 Jun 6;66(5):551-562. doi: 10.1093/annweh/wxab106.
4
Beyond crosswalks: reliability of exposure assessment following automated coding of free-text job descriptions for occupational epidemiology.超越人行横道:职业流行病学中文本自由格式工作描述自动编码后暴露评估的可靠性
Ann Occup Hyg. 2014 May;58(4):482-92. doi: 10.1093/annhyg/meu006. Epub 2014 Feb 6.
5
Evaluation of Automatically Assigned Job-Specific Interview Modules.自动分配的特定工作面试模块评估
Ann Occup Hyg. 2016 Aug;60(7):885-99. doi: 10.1093/annhyg/mew029. Epub 2016 Jun 1.
6
Occupational self-coding and automatic recording (OSCAR): a novel web-based tool to collect and code lifetime job histories in large population-based studies.职业自我编码与自动记录(OSCAR):一种用于在大型基于人群的研究中收集和编码终生工作经历的新型网络工具。
Scand J Work Environ Health. 2017 Mar 1;43(2):181-186. doi: 10.5271/sjweh.3613. Epub 2016 Dec 14.
7
Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison.从一般人群研究中自动编码工作描述:现有工具概述、应用及比较。
Ann Work Expo Health. 2023 Jun 6;67(5):663-672. doi: 10.1093/annweh/wxad002.
8
Efficiency of autocoding programs for converting job descriptors into standard occupational classification (SOC) codes.自动编码程序将工作描述转换为标准职业分类(SOC)代码的效率。
Am J Ind Med. 2019 Jan;62(1):59-68. doi: 10.1002/ajim.22928. Epub 2018 Dec 5.
9
Beware the Grizzlyman: A comparison of job- and industry-based noise exposure estimates using manual coding and the NIOSH NIOCCS machine learning algorithm.小心灰熊人:使用手动编码和 NIOSH NIOCCS 机器学习算法比较基于工作和行业的噪声暴露估计。
J Occup Environ Hyg. 2022 Jul;19(7):437-447. doi: 10.1080/15459624.2022.2076860. Epub 2022 Jun 7.
10
JEMs and incompatible occupational coding systems: effect of manual and automatic recoding of job codes on exposure assignment.职业暴露监测系统(JEMs)与不兼容的职业编码系统:工作代码的手动和自动重新编码对暴露赋值的影响。
Ann Occup Hyg. 2013 Jan;57(1):107-14. doi: 10.1093/annhyg/mes046. Epub 2012 Jul 17.

引用本文的文献

1
OPERAS decision support system versus manual job coding: a quantitative analysis on coding time and inter-coder reliability.OPERAS决策支持系统与手工工作编码:编码时间和编码员间信度的定量分析
Occup Environ Med. 2025 Jul 9;82(4):183-190. doi: 10.1136/oemed-2024-109823.
2
Invited Perspective: How Far Have We Come? Revisiting a 2009 Report on Occupational Cancer Research Recommendations.特邀观点:我们取得了多大进展?重温2009年关于职业性癌症研究建议的报告。
Environ Health Perspect. 2023 Oct;131(10):101303. doi: 10.1289/EHP13883. Epub 2023 Oct 30.

本文引用的文献

1
Determining occupation for National Violent Death Reporting System records: An evaluation of autocoding programs.确定国家暴力死亡报告系统记录的职业:自动编码程序的评估。
Am J Ind Med. 2021 Dec;64(12):1018-1027. doi: 10.1002/ajim.23292. Epub 2021 Sep 7.
2
Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC).职位名称的职业编码:加拿大国家职业分类(ACA-NOC)自动编码算法的迭代开发
JMIR Form Res. 2020 Aug 5;4(8):e16422. doi: 10.2196/16422.
3
Efficiency of autocoding programs for converting job descriptors into standard occupational classification (SOC) codes.自动编码程序将工作描述转换为标准职业分类(SOC)代码的效率。
Am J Ind Med. 2019 Jan;62(1):59-68. doi: 10.1002/ajim.22928. Epub 2018 Dec 5.
4
Development of a Coding and Crosswalk Tool for Occupations and Industries.职业和行业编码及转换工具的开发。
Ann Work Expo Health. 2018 Aug 13;62(7):796-807. doi: 10.1093/annweh/wxy052.
5
Development of and Selected Performance Characteristics of CANJEM, a General Population Job-Exposure Matrix Based on Past Expert Assessments of Exposure.基于过去专家对暴露情况的评估,开发了一般人群职业暴露矩阵 CANJEM,并对其选择性能特征进行了研究。
Ann Work Expo Health. 2018 Aug 13;62(7):783-795. doi: 10.1093/annweh/wxy044.
6
Availability of a New Job-Exposure Matrix (CANJEM) for Epidemiologic and Occupational Medicine Purposes.新型职业暴露矩阵(CANJEM)在流行病学和职业医学中的应用。
J Occup Environ Med. 2018 Jul;60(7):e324-e328. doi: 10.1097/JOM.0000000000001335.
7
Capture and coding of industry and occupation measures: Findings from eight National Program of Cancer Registries states.行业和职业措施的采集与编码:来自八个国家癌症登记项目州的结果
Am J Ind Med. 2017 Aug;60(8):689-695. doi: 10.1002/ajim.22739.
8
Occupational self-coding and automatic recording (OSCAR): a novel web-based tool to collect and code lifetime job histories in large population-based studies.职业自我编码与自动记录(OSCAR):一种用于在大型基于人群的研究中收集和编码终生工作经历的新型网络工具。
Scand J Work Environ Health. 2017 Mar 1;43(2):181-186. doi: 10.5271/sjweh.3613. Epub 2016 Dec 14.
9
Occupational exposure to chlorinated solvents and kidney cancer: a case-control study.职业性接触氯化溶剂与肾癌:一项病例对照研究。
Occup Environ Med. 2017 Mar;74(4):268-274. doi: 10.1136/oemed-2016-103849. Epub 2016 Nov 1.
10
Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.基于计算机的自由文本职位描述编码,以在流行病学研究中高效识别职业。
Occup Environ Med. 2016 Jun;73(6):417-24. doi: 10.1136/oemed-2015-103152. Epub 2016 Apr 21.