Madrid-García Alfredo, Pérez-Sancristobal Inés, Leon Leticia, Abásolo Lydia, Fernández-Gutiérrez Benjamín, Rodríguez-Rodríguez Luis
Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, 28040, Madrid, Spain.
Sci Rep. 2025 Jul 1;15(1):20944. doi: 10.1038/s41598-025-05294-5.
Occupational data is a crucial social determinant of health, influencing diagnostic accuracy, treatment strategies, and policy-making in healthcare. However, its inclusion in electronic health records (EHR) is often relegated to unstructured fields. This study aims to assess the collection and use of occupation-related data in rheumatology clinical narratives, describe factors influencing its collection, and analyze its association with patient diagnoses. We employed a pre-trained Spanish language model fine-tuned with biomedical texts to identify occupation mentions in the EHR of 35,586 rheumatic patients. The model's performance was evaluated using a gold-standard dataset with precision, recall, and F1-score metrics. Occupation mentions were normalized using the European Skills, Competences, Qualifications, and Occupations (ESCO) classification. Logistic regression analyses identified sociodemographic and clinical predictors of occupation collection and examined associations between occupations and diagnoses. The model achieved an F1-score of 0.73, identifying valid occupation mentions in 3527 patients (10%). Normalization yielded 402 ESCO codes. Mechanical pathologies such as back pain and muscle disorders were associated with a higher probability of occupation collection, while professions like cleaners and helpers were linked to these conditions. Customer service clerks and hairdressers were associated with autoimmune diseases. This study demonstrates the feasibility of automated occupation recognition in EHRs, highlighting the relevance of occupational data as a social determinant of health in rheumatology. Integrating such data could inform targeted prevention and treatment strategies for rheumatic diseases.
职业数据是健康的关键社会决定因素,影响着医疗保健中的诊断准确性、治疗策略和政策制定。然而,它在电子健康记录(EHR)中的纳入往往被归入非结构化字段。本研究旨在评估风湿病临床记录中与职业相关数据的收集和使用情况,描述影响其收集的因素,并分析其与患者诊断的关联。我们使用经过生物医学文本微调的预训练西班牙语语言模型,在35586名风湿病患者的电子健康记录中识别职业提及。使用具有精确率、召回率和F1分数指标的金标准数据集评估该模型的性能。使用欧洲技能、能力、资格和职业(ESCO)分类对职业提及进行标准化。逻辑回归分析确定了职业收集的社会人口统计学和临床预测因素,并检查了职业与诊断之间的关联。该模型的F1分数为0.73,在3527名患者(10%)中识别出有效的职业提及。标准化产生了402个ESCO代码。背痛和肌肉疾病等机械性病症与职业收集的可能性较高相关,而清洁工和助手等职业与这些病症有关。客服人员和美发师与自身免疫性疾病有关。本研究证明了在电子健康记录中自动识别职业的可行性,突出了职业数据作为风湿病健康社会决定因素的相关性。整合此类数据可为风湿病的针对性预防和治疗策略提供信息。