• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于弱监督的深度学习对临床笔记进行阿尔茨海默病生活方式状况分类。

Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.

机构信息

College of Science and Engineering, University of Minnesota, Minneapolis, USA.

Department of Pharmaceutical Care and Health Systems, University of Minnesota, Minneapolis, USA.

出版信息

BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4.

DOI:10.1186/s12911-022-01819-4
PMID:35799294
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9261217/
Abstract

BACKGROUND

Since no effective therapies exist for Alzheimer's disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle's effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English.

METHODS

Based on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts + full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies.

RESULTS

The UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively.

CONCLUSION

The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer's disease in clinical notes.

摘要

背景

由于目前尚无治疗阿尔茨海默病(AD)的有效疗法,因此通过改变生活方式和干预措施来预防疾病变得更加重要。分析 AD 患者的电子健康记录(EHR)可以帮助我们更好地了解生活方式对 AD 的影响。然而,生活方式信息通常存储在临床叙述中。因此,本研究的目的是比较不同的自然语言处理(NLP)模型在从英语临床文本中分类生活方式状态(例如,身体活动和过度饮食)方面的性能。

方法

基于与生活方式状态相关的收集概念唯一标识符(CUI),我们从明尼苏达大学(UMN)的临床数据存储库(CDR)中提取了所有 AD 患者的相关 EHR。我们使用基于规则的 NLP 算法自动为训练数据生成标签。我们对经过预训练的基于双向编码器表示的转换器(BERT)模型和三个传统机器学习模型(作为基线模型)进行了弱监督,这些模型包括 BERT 基础模型、PubMedBERT(摘要+全文)、PubMedBERT(仅摘要)、统一医学语言系统(UMLS)BERT、Bio BERT、Bio-clinical BERT、逻辑回归、支持向量机和随机森林。用于弱监督的基于规则的模型在 GSC 上进行了测试,以便进行比较。我们进行了两项案例研究:身体活动和过度饮食,以便验证 BERT 模型在分类生活方式状态方面的有效性,所有模型都在两项案例研究的开发的金标准语料库(GSC)上进行了评估和比较。

结果

UMLS BERT 模型在分类身体活动状态方面表现最佳,其精度、召回率和 F1 分数分别为 0.93、0.93 和 0.92。关于分类过度饮食,Bio-clinical BERT 模型表现最佳,其精度、召回率和 F1 分数分别为 0.93、0.93 和 0.93。

结论

该研究提出的利用弱监督的方法可以显著增加训练深度学习模型所需的样本量。通过与传统机器学习模型进行比较,该研究还证明了 BERT 模型在分类 AD 临床记录中的生活方式状态方面的高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42f8/9264482/b6b45aa48293/12911_2022_1819_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42f8/9264482/ffee00f72c3f/12911_2022_1819_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42f8/9264482/b6b45aa48293/12911_2022_1819_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42f8/9264482/ffee00f72c3f/12911_2022_1819_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42f8/9264482/b6b45aa48293/12911_2022_1819_Fig2_HTML.jpg

相似文献

1
Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.使用基于弱监督的深度学习对临床笔记进行阿尔茨海默病生活方式状况分类。
BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4.
2
Identification of asthma control factor in clinical notes using a hybrid deep learning model.使用混合深度学习模型从临床记录中识别哮喘控制因素。
BMC Med Inform Decis Mak. 2021 Nov 9;21(Suppl 7):272. doi: 10.1186/s12911-021-01633-4.
3
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
4
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
5
Ontology-driven and weakly supervised rare disease identification from clinical notes.基于本体的临床笔记辅助下的弱监督罕见病识别。
BMC Med Inform Decis Mak. 2023 May 5;23(1):86. doi: 10.1186/s12911-023-02181-9.
6
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
7
AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease.AD-BERT:利用预训练语言模型预测从轻度认知障碍到阿尔茨海默病的进展。
J Biomed Inform. 2023 Aug;144:104442. doi: 10.1016/j.jbi.2023.104442. Epub 2023 Jul 8.
8
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
9
Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.利用本体和弱监督从临床记录中识别罕见病。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2294-2298. doi: 10.1109/EMBC46164.2021.9630043.
10
Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.基于语音比较预训练模型和基于特征的模型对阿尔茨海默病的预测
Front Aging Neurosci. 2021 Apr 27;13:635945. doi: 10.3389/fnagi.2021.635945. eCollection 2021.

引用本文的文献

1
A novel dual embedding few-shot learning approach for classifying bone loss using orthopantomogram radiographic notes.一种用于使用全景X线片影像学记录对骨质流失进行分类的新型双嵌入少样本学习方法。
Head Face Med. 2025 Jul 11;21(1):49. doi: 10.1186/s13005-025-00528-3.
2
Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究:一个概念框架。
Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.
3
Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.

本文引用的文献

1
Fine-tuning large neural language models for biomedical natural language processing.针对生物医学自然语言处理对大型神经语言模型进行微调。
Patterns (N Y). 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729.
2
Ontology-driven weak supervision for clinical entity classification in electronic health records.基于本体的电子健康记录中临床实体分类的弱监督方法。
Nat Commun. 2021 Apr 1;12(1):2017. doi: 10.1038/s41467-021-22328-4.
3
Healthy lifestyle and the risk of Alzheimer dementia: Findings from 2 longitudinal studies.健康的生活方式与阿尔茨海默病痴呆风险:来自 2 项纵向研究的发现。
在临床自然语言处理中利用大语言模型进行无知识弱监督。
Sci Rep. 2025 Mar 10;15(1):8241. doi: 10.1038/s41598-024-68168-2.
4
Natural language processing in Alzheimer's disease research: Systematic review of methods, data, and efficacy.阿尔茨海默病研究中的自然语言处理:方法、数据和疗效的系统综述
Alzheimers Dement (Amst). 2025 Feb 11;17(1):e70082. doi: 10.1002/dad2.70082. eCollection 2025 Jan-Mar.
5
Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing.利用大语言模型进行临床自然语言处理中的无知识弱监督
Res Sq. 2024 Jun 28:rs.3.rs-4559971. doi: 10.21203/rs.3.rs-4559971/v1.
6
The Role of the Neural Exposome as a Novel Strategy to Identify and Mitigate Health Inequities in Alzheimer's Disease and Related Dementias.神经暴露组作为一种识别和减轻阿尔茨海默病及相关痴呆症健康不平等现象的新策略的作用。
Mol Neurobiol. 2025 Jan;62(1):1205-1224. doi: 10.1007/s12035-024-04339-6. Epub 2024 Jul 5.
7
Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.基于检索的诊断决策支持:混合方法研究。
JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.
8
An audio-semantic multimodal model for automatic obstructive sleep Apnea-Hypopnea Syndrome classification via multi-feature analysis of snoring sounds.一种通过鼾声多特征分析实现阻塞性睡眠呼吸暂停低通气综合征自动分类的音频语义多模态模型。
Front Neurosci. 2024 May 10;18:1336307. doi: 10.3389/fnins.2024.1336307. eCollection 2024.
9
A review on Natural Language Processing Models for COVID-19 research.关于用于新冠病毒研究的自然语言处理模型的综述。
Healthc Anal (N Y). 2022 Nov;2:100078. doi: 10.1016/j.health.2022.100078. Epub 2022 Jul 19.
10
Weakly supervised spatial relation extraction from radiology reports.从放射学报告中进行弱监督空间关系提取。
JAMIA Open. 2023 Apr 22;6(2):ooad027. doi: 10.1093/jamiaopen/ooad027. eCollection 2023 Jul.
Neurology. 2020 Jul 28;95(4):e374-e383. doi: 10.1212/WNL.0000000000009816. Epub 2020 Jun 17.
4
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
5
Automatic extraction and assessment of lifestyle exposures for Alzheimer's disease using natural language processing.利用自然语言处理技术自动提取和评估阿尔茨海默病的生活方式暴露情况。
Int J Med Inform. 2019 Oct;130:103943. doi: 10.1016/j.ijmedinf.2019.08.003. Epub 2019 Aug 6.
6
Physical Activity as a Moderator of Alzheimer Pathology: A Systematic Review of Observational Studies.体力活动作为阿尔茨海默病病理的调节剂:观察性研究的系统综述。
Curr Alzheimer Res. 2019;16(4):362-378. doi: 10.2174/1567205016666190315095151.
7
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
8
Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.使用临床自然语言处理进行健康结果研究:未来进展的概述和可行建议。
J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.
9
Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.
10
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.