自然语言处理和大语言模型在健康社会决定因素中的应用：系统评价方案

Applications of Natural Language Processing and Large Language Models for Social Determinants of Health: Protocol for a Systematic Review.

作者信息

Rajwal Swati, Zhang Ziyuan, Chen Yankai, Rogers Hannah, Sarker Abeed, Xiao Yunyu

机构信息

Department of Computer Science, Emory University, Atlanta, GA, United States.

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States.

出版信息

JMIR Res Protoc. 2025 Jan 21;14:e66094. doi: 10.2196/66094.

DOI:10.2196/66094

PMID:39836952

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11795155/

Abstract

BACKGROUND

In recent years, the intersection of natural language processing (NLP) and public health has opened innovative pathways for investigating social determinants of health (SDOH) in textual datasets. Despite the promise of NLP in the SDOH domain, the literature is dispersed across various disciplines, and there is a need to consolidate existing knowledge, identify knowledge gaps in the literature, and inform future research directions in this emerging field.

OBJECTIVE

This research protocol describes a systematic review to identify and highlight NLP techniques, including large language models, used for SDOH-related studies.

METHODS

A search strategy will be executed across PubMed, Web of Science, IEEE Xplore, Scopus, PsycINFO, HealthSource: Academic Nursing, and ACL Anthology to find studies published in English between 2014 and 2024. Three reviewers (SR, ZZ, and YC) will independently screen the studies to avoid voting bias, and two (AS and YX) additional reviewers will resolve any conflicts during the screening process. We will further screen studies that cited the included studies (forward search). Following the title abstract and full-text screening, the characteristics and main findings of the included studies and resources will be tabulated, visualized, and summarized.

RESULTS

The search strategy was formulated and run across the 7 databases in August 2024. We expect the results to be submitted for peer review publication in early 2025. As of December 2024, the title and abstract screening was underway.

CONCLUSIONS

This systematic review aims to provide a comprehensive study of existing research on the application of NLP for various SDOH tasks across multiple textual datasets. By rigorously evaluating the methodologies, tools, and outcomes of eligible studies, the review will identify gaps in current knowledge and suggest directions for future research in the form of specific research questions. The findings will be instrumental in developing more effective NLP models for SDOH, ultimately contributing to improved health outcomes and a better understanding of social determinants in diverse populations.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/66094.

摘要

背景

近年来，自然语言处理（NLP）与公共卫生的交叉领域为在文本数据集中研究健康的社会决定因素（SDOH）开辟了创新途径。尽管NLP在SDOH领域前景广阔，但相关文献分散在各个学科中，因此有必要整合现有知识，找出文献中的知识空白，并为这一新兴领域的未来研究方向提供参考。

目的

本研究方案描述了一项系统综述，以识别和突出用于SDOH相关研究的NLP技术，包括大语言模型。

方法

将在PubMed、科学网、IEEE Xplore、Scopus、PsycINFO、HealthSource: Academic Nursing和ACL Anthology上执行检索策略，以查找2014年至2024年期间发表的英文研究。三位评审员（SR、ZZ和YC）将独立筛选研究以避免投票偏差，另外两位评审员（AS和YX）将在筛选过程中解决任何冲突。我们还将进一步筛选引用了纳入研究的文献（向前搜索）。在标题摘要和全文筛选之后，将把纳入研究和资源的特征及主要发现制成表格、进行可视化展示并加以总结。

结果

检索策略于2024年8月在7个数据库中制定并运行。我们预计结果将于2025年初提交同行评审发表。截至2024年12月，标题和摘要筛选正在进行中。

结论

本系统综述旨在全面研究NLP在多个文本数据集上用于各种SDOH任务的现有研究。通过严格评估合格研究的方法、工具和结果，该综述将找出当前知识中的空白，并以具体研究问题的形式为未来研究提出方向。研究结果将有助于开发更有效的用于SDOH的NLP模型，最终有助于改善健康结果，并更好地理解不同人群中的社会决定因素。

国际注册报告识别号（IRRID）：DERR1-10.2196/66094。