• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于变压器的自然语言处理模型研究肺癌患者健康的社会和行为决定因素。

A Study of Social and Behavioral Determinants of Health in Lung Cancer Patients Using Transformers-based Natural Language Processing Models.

机构信息

Department of Health Outcomes and Biomedical Informatics.

Cancer Informatics Shared Resources, University of Florida Health Cancer Center, University of Florida, Gainesville, Florida, USA.

出版信息

AMIA Annu Symp Proc. 2022 Feb 21;2021:1225-1233. eCollection 2021.

PMID:35309014
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8861705/
Abstract

Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models. However, there are limited studies to examine SBDoH factors in clinical outcomes due to the lack of structured SBDoH information in current electronic health record (EHR) systems, while much of the SBDoH information is documented in clinical narratives. Natural language processing (NLP) is thus the key technology to extract such information from unstructured clinical text. However, there is not a mature clinical NLP system focusing on SBDoH. In this study, we examined two state-of-the-art transformer-based NLP models, including BERT and RoBERTa, to extract SBDoH concepts from clinical narratives, applied the best performing model to extract SBDoH concepts on a lung cancer screening patient cohort, and examined the difference of SBDoH information between NLP extracted results and structured EHRs (SBDoH information captured in standard vocabularies such as the International Classification of Diseases codes). The experimental results show that the BERT-based NLP model achieved the best strict/lenient F1-score of 0.8791 and 0.8999, respectively. The comparison between NLP extracted SBDoH information and structured EHRs in the lung cancer patient cohort of 864 patients with 161,933 various types of clinical notes showed that much more detailed information about smoking, education, and employment were only captured in clinical narratives and that it is necessary to use both clinical narratives and structured EHRs to construct a more complete picture of patients' SBDoH factors.

摘要

社会和行为决定因素(Social and behavioral determinants of health,SBDoH)对塑造人们的健康起着重要作用。在临床研究中,尤其是在比较疗效研究中,如果不调整 SBDoH 因素,将在统计分析和基于机器学习的模型中造成混淆问题和分类错误。然而,由于当前电子健康记录(Electronic health record,EHR)系统中缺乏结构化的 SBDoH 信息,以及大部分 SBDoH 信息都记录在临床叙述中,因此很少有研究关注临床结局中的 SBDoH 因素。自然语言处理(Natural language processing,NLP)是从非结构化临床文本中提取此类信息的关键技术。然而,目前还没有一个成熟的专注于 SBDoH 的临床 NLP 系统。在这项研究中,我们检查了两种最先进的基于转换器的 NLP 模型,包括 BERT 和 RoBERTa,以从临床叙述中提取 SBDoH 概念,应用表现最好的模型从一个肺癌筛查患者队列中提取 SBDoH 概念,并检查了 NLP 提取结果与结构化 EHR(使用国际疾病分类代码等标准词汇表捕获的 SBDoH 信息)之间的 SBDoH 信息差异。实验结果表明,基于 BERT 的 NLP 模型分别达到了最佳的严格/宽松 F1 得分为 0.8791 和 0.8999。在 864 名肺癌患者队列中,对 161933 种不同类型的临床笔记进行的比较显示,NLP 提取的 SBDoH 信息与结构化 EHRs 之间,仅在临床叙述中才能捕捉到更多关于吸烟、教育和就业的详细信息,因此有必要同时使用临床叙述和结构化 EHRs,以构建更完整的患者 SBDoH 因素图景。

相似文献

1
A Study of Social and Behavioral Determinants of Health in Lung Cancer Patients Using Transformers-based Natural Language Processing Models.基于变压器的自然语言处理模型研究肺癌患者健康的社会和行为决定因素。
AMIA Annu Symp Proc. 2022 Feb 21;2021:1225-1233. eCollection 2021.
2
Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods.使用基于转换器的自然语言处理方法识别与糖尿病视网膜病变相关的临床概念及其属性。
BMC Med Inform Decis Mak. 2022 Sep 27;22(Suppl 3):255. doi: 10.1186/s12911-022-01996-2.
3
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
4
Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study.西班牙电子健康记录中射血分数保留的心力衰竭症状检测语言模型的多标准优化:比较建模研究
J Med Internet Res. 2025 Jul 17;27:e76433. doi: 10.2196/76433.
5
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
6
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
7
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
8
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
9
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
10
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

引用本文的文献

1
Academic case reports lack diversity: Assessing the presence and diversity of sociodemographic and behavioral factors related to Post COVID-19 Condition.学术病例报告缺乏多样性:评估与新冠后状况相关的社会人口学和行为因素的存在情况及多样性。
PLoS One. 2025 Jul 2;20(7):e0326668. doi: 10.1371/journal.pone.0326668. eCollection 2025.
2
Social determinants of health extraction from clinical notes across institutions using large language models.使用大语言模型从各机构的临床记录中提取健康的社会决定因素。
NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8.
3
Artificial Intelligence Advancements in Oncology: A Review of Current Trends and Future Directions.肿瘤学中的人工智能进展:当前趋势与未来方向综述
Biomedicines. 2025 Apr 13;13(4):951. doi: 10.3390/biomedicines13040951.
4
Patient-Centered Research Through Artificial Intelligence to Identify Priorities in Cancer Care.通过人工智能开展以患者为中心的研究,以确定癌症护理的重点。
JAMA Oncol. 2025 Apr 24. doi: 10.1001/jamaoncol.2025.0694.
5
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
6
Large language models in cancer: potentials, risks, and safeguards.癌症领域的大语言模型:潜力、风险与保障措施
BJR Artif Intell. 2024 Dec 20;2(1):ubae019. doi: 10.1093/bjrai/ubae019. eCollection 2025 Jan.
7
Engineering of Generative Artificial Intelligence and Natural Language Processing Models to Accurately Identify Arrhythmia Recurrence.用于准确识别心律失常复发的生成式人工智能和自然语言处理模型的工程设计。
Circ Arrhythm Electrophysiol. 2025 Jan;18(1):e013023. doi: 10.1161/CIRCEP.124.013023. Epub 2024 Dec 16.
8
Realizing the potential of social determinants data in EHR systems: A scoping review of approaches for screening, linkage, extraction, analysis, and interventions.认识电子健康记录系统中社会决定因素数据的潜力:对筛查、关联、提取、分析和干预方法的范围审查
J Clin Transl Sci. 2024 Oct 10;8(1):e147. doi: 10.1017/cts.2024.571. eCollection 2024.
9
Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.从临床记录中大规模识别健康的社会和行为决定因素:潜在语义索引和生成式预训练转换器 (GPT) 模型的比较。
BMC Med Inform Decis Mak. 2024 Oct 10;24(1):296. doi: 10.1186/s12911-024-02705-x.
10
A fair individualized polysocial risk score for identifying increased social risk in type 2 diabetes.一个公平的个体化多社会风险评分,用于识别 2 型糖尿病患者的社会风险增加。
Nat Commun. 2024 Oct 5;15(1):8653. doi: 10.1038/s41467-024-52960-9.

本文引用的文献

1
Identification of social determinants of health using multi-label classification of electronic health record clinical notes.利用电子健康记录临床笔记的多标签分类识别健康的社会决定因素。
JAMIA Open. 2021 Feb 9;4(3):ooaa069. doi: 10.1093/jamiaopen/ooaa069. eCollection 2021 Jul.
2
International Classification of Diseases, Tenth Revision, Clinical Modification social determinants of health codes are poorly used in electronic health records.国际疾病分类第十版临床修订版社会决定因素健康代码在电子健康记录中未得到充分利用。
Medicine (Baltimore). 2020 Dec 24;99(52):e23818. doi: 10.1097/MD.0000000000023818.
3
Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models.从临床叙述中提取患者家族病史:使用深度学习模型探索端到端解决方案
JMIR Med Inform. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982.
4
Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.使用主动学习对健康的社会决定因素进行标注,并使用神经事件提取对决定因素进行特征描述。
J Biomed Inform. 2021 Jan;113:103631. doi: 10.1016/j.jbi.2020.103631. Epub 2020 Dec 5.
5
Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models.临床文本中语义文本相似度的测量:基于Transformer模型的比较。
JMIR Med Inform. 2020 Nov 23;8(11):e19735. doi: 10.2196/19735.
6
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.
7
Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data.利用结构化和自由文本临床数据检测健康的社会和行为决定因素。
Appl Clin Inform. 2020 Jan;11(1):172-181. doi: 10.1055/s-0040-1702214. Epub 2020 Mar 4.
8
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
9
Social determinants of breast cancer risk, stage, and survival.乳腺癌风险、分期和生存的社会决定因素。
Breast Cancer Res Treat. 2019 Oct;177(3):537-548. doi: 10.1007/s10549-019-05340-7. Epub 2019 Jul 3.
10
Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.