使用自然语言处理技术从电子健康记录中提取健康的社会决定因素

Extraction of Social Determinants of Health From Electronic Health Records Using Natural Language Processing.

作者信息

Chen Zhenghua, Lasserre Patricia, Lin Angela, Rajapakshe Rasika

机构信息

BC Cancer Kelowna, Kelowna, Canada.

Computer Science, University of British Columbia-Okanagan, Kelowna, Canada.

出版信息

JCO Clin Cancer Inform. 2025 Jul;9:e2400317. doi: 10.1200/CCI-24-00317. Epub 2025 Jul 23.

DOI:10.1200/CCI-24-00317

PMID:40700678

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12309507/

Abstract

PURPOSE

Social Determinants of Health (SDoH) have a significant effect on health outcomes and inequalities. SDoH can be extracted from electronic health records (EHR) to aid policy development and research to improve population health. Automated extraction using artificial intelligence (AI) can improve efficiency and cost-effectiveness. The focus of this study was to autonomously extract comprehensive SDoH details from EHR using a natural language processing (NLP)-based AI pipeline.

MATERIALS AND METHODS

A curated set of 1,000 BC Cancer clinical documents with concentrated SDoH information served as the reference standard for training and evaluating NLP models. Two pipelines were used: an open-source pipeline trained on the annotated medical documents and an industrial pretrained solution used as a benchmark. Three experiments optimized the first pipeline's performance, assessing the effect of including subtype word positions during training. The superior open-source pipeline was then used to extract SDoH information from 13,258 oncology documents.

RESULTS

The open-source pipeline achieved an average F1 score accuracy of 0.88 on the validation data set for extracting 13 SDoH factors, surpassing the benchmark by 5%. It excelled in detailed subtype extraction, while the benchmark performed better in identifying rarely annotated SDoH information in BC Cancer data set. Overall, 60,717 SDoH factors and associated details were extracted from BC Cancer EHR oncology documents. The most frequently extracted SDoH factors included tobacco use, employment status, marital status, alcohol consumption, and living status, occurring between 8k to 12k times.

CONCLUSION

This study demonstrates the potential of an NLP pipeline to extract SDoH factors from clinical notes, with strong performance on limited data, although data set-specific adjustments are needed for broader application across institutions.

摘要

目的

健康的社会决定因素（SDoH）对健康结果和不平等现象有重大影响。可从电子健康记录（EHR）中提取SDoH，以辅助政策制定和研究，从而改善人群健康。使用人工智能（AI）进行自动提取可提高效率和成本效益。本研究的重点是使用基于自然语言处理（NLP）的AI管道从EHR中自主提取全面的SDoH详细信息。

材料与方法

一组精心挑选的1000份包含集中SDoH信息的卑诗省癌症临床文档用作训练和评估NLP模型的参考标准。使用了两个管道：一个在带注释的医学文档上训练的开源管道，以及一个用作基准的工业预训练解决方案。进行了三个实验来优化第一个管道的性能，评估在训练期间纳入子类型词位置的影响。然后使用 superior开源管道从13258份肿瘤学文档中提取SDoH信息。

结果

开源管道在提取13个SDoH因素的验证数据集上实现了平均F1分数准确率为0.88，比基准高出5%。它在详细的子类型提取方面表现出色，而基准在识别卑诗省癌症数据集中注释较少的SDoH信息方面表现更好。总体而言，从卑诗省癌症EHR肿瘤学文档中提取了60717个SDoH因素及相关详细信息。最常提取的SDoH因素包括烟草使用、就业状况、婚姻状况、酒精消费和居住状况，出现次数在8000至12000次之间。

结论

本研究证明了NLP管道从临床记录中提取SDoH因素的潜力，在有限数据上表现强劲，尽管需要针对特定数据集进行调整才能在各机构中更广泛地应用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用自然语言处理技术从电子健康记录中提取健康的社会决定因素

Extraction of Social Determinants of Health From Electronic Health Records Using Natural Language Processing.

作者信息

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

本文引用的文献

使用自然语言处理技术从电子健康记录中提取健康的社会决定因素

Extraction of Social Determinants of Health From Electronic Health Records Using Natural Language Processing.

作者信息

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

本文引用的文献