Suppr超能文献

健康的社会决定因素生成式预训练变换器:利用大语言模型提取健康的社会决定因素

SDoH-GPT: using large language models to extract social determinants of health.

作者信息

Consoli Bernardo, Wang Haoyang, Wu Xizhi, Wang Song, Zhao Xinyu, Wang Yanshan, Rousseau Justin, Hartvigsen Tom, Shen Li, Wu Huanmei, Peng Yifan, Long Qi, Chen Tianlong, Ding Ying

机构信息

School of Information, University of Texas at Austin, Austin, TX 78712, United States.

School of Technology, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre 90619-900, Brazil.

出版信息

J Am Med Inform Assoc. 2025 Jun 10. doi: 10.1093/jamia/ocaf094.

Abstract

OBJECTIVE

Extracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability.

MATERIALS AND METHODS

SDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention.

RESULTS

Our approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores.

DISCUSSION

This study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost.

CONCLUSION

The key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions.

摘要

目的

从医学记录中提取健康的社会决定因素(SDoH)严重依赖劳动密集型注释,这些注释通常是特定任务的,妨碍了可重用性并限制了共享。在此,我们引入了SDoH-GPT,这是一个新颖的框架,利用少样本学习大语言模型(LLM)从非结构化文本中自动提取SDoH,旨在提高效率和通用性。

材料与方法

SDoH-GPT是一个框架,包括用于从医学记录中提取SDoH的少样本学习LLM方法以及XGBoost分类器,该分类器使用少样本学习LLM方法生成的注释作为训练数据集继续对SDoH进行分类。少样本学习LLM方法与XGBoost的独特组合利用了LLM作为优秀少样本学习者的优势以及当训练数据集充足时XGBoost的效率。因此,SDoH-GPT可以在不依赖大量医学注释或昂贵人工干预的情况下提取SDoH。

结果

我们的方法分别在时间和成本上实现了十倍和二十倍的降低,并且与人类注释者的一致性更高,通过科恩kappa系数衡量高达0.92。LLM和XGBoost的创新组合可以确保高精度和计算效率,同时始终保持0.90以上的曲线下面积(AUROC)分数。

讨论

本研究在三个数据集上验证了SDoH-GPT,并突出了利用LLM和XGBoost彻底改变医学记录分类的潜力,证明了其能够以显著降低的时间和成本实现高度准确的分类。

结论

本研究的关键贡献是将LLM与XGBoost集成,这使得能够对SDoH进行经济高效且高质量的注释。这项研究为SDoH在推动未来医疗保健解决方案方面更易于获取、可扩展且具有影响力奠定了基础。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验