• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于上下文感知线性模型的医学报告风险因素量化。

Quantifying risk factors in medical reports with a context-aware linear model.

机构信息

National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, United Kingdom.

出版信息

J Am Med Inform Assoc. 2019 Jun 1;26(6):537-546. doi: 10.1093/jamia/ocz004.

DOI:10.1093/jamia/ocz004
PMID:30840055
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6515525/
Abstract

OBJECTIVE

We seek to quantify the mortality risk associated with mentions of medical concepts in textual electronic health records (EHRs). Recognizing mentions of named entities of relevant types (eg, conditions, symptoms, laboratory tests or behaviors) in text is a well-researched task. However, determining the level of risk associated with them is partly dependent on the textual context in which they appear, which may describe severity, temporal aspects, quantity, etc.

METHODS

To take into account that a given word appearing in the context of different risk factors (medical concepts) can make different contributions toward risk level, we propose a multitask approach, called context-aware linear modeling, which can be applied using appropriately regularized linear regression. To improve the performance for risk factors unseen in training data (eg, rare diseases), we take into account their distributional similarity to other concepts.

RESULTS

The evaluation is based on a corpus of 531 reports from EHRs with 99 376 risk factors rated manually by experts. While context-aware linear modeling significantly outperforms single-task models, taking into account concept similarity further improves performance, reaching the level of human annotators' agreements.

CONCLUSION

Our results show that automatic quantification of risk factors in EHRs can achieve performance comparable to human assessment, and taking into account the multitask structure of the problem and the ability to handle rare concepts is crucial for its accuracy.

摘要

目的

我们旨在量化电子病历(EHR)文本中提及医学概念与死亡率之间的关联风险。识别文本中相关类型(如病症、症状、实验室检查或行为)的命名实体提及已得到充分研究。然而,确定与它们相关的风险水平部分取决于它们出现的文本上下文,上下文可能描述严重程度、时间方面、数量等。

方法

为了考虑到在不同风险因素(医学概念)的上下文中出现的给定单词可能对风险水平有不同的贡献,我们提出了一种称为上下文感知线性建模的多任务方法,它可以使用适当正则化的线性回归来应用。为了提高对训练数据中未见过的风险因素(例如罕见疾病)的性能,我们考虑了它们与其他概念的分布相似性。

结果

评估基于一个由 531 份来自 EHR 的报告组成的语料库,其中 99376 个风险因素由专家手动评分。虽然上下文感知线性建模明显优于单任务模型,但考虑到概念相似性进一步提高了性能,达到了人类注释者协议的水平。

结论

我们的研究结果表明,EHR 中风险因素的自动量化可以达到与人类评估相当的性能,并且考虑到问题的多任务结构和处理罕见概念的能力对于准确性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/115cb97a3499/ocz004f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/497dea151346/ocz004f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/3f227347f989/ocz004f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/eef51a59aff6/ocz004f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/115cb97a3499/ocz004f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/497dea151346/ocz004f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/3f227347f989/ocz004f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/eef51a59aff6/ocz004f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754b/6515525/115cb97a3499/ocz004f4.jpg

相似文献

1
Quantifying risk factors in medical reports with a context-aware linear model.基于上下文感知线性模型的医学报告风险因素量化。
J Am Med Inform Assoc. 2019 Jun 1;26(6):537-546. doi: 10.1093/jamia/ocz004.
2
Clinical risk prediction using language models: benefits and considerations.临床风险预测使用语言模型:优势与考量。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1856-1864. doi: 10.1093/jamia/ocae030.
3
Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.将异构文本源中的表型信息映射到特定领域的术语资源。
PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.
4
Challenges in clinical natural language processing for automated disorder normalization.临床自然语言处理中自动疾病标准化的挑战。
J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.
5
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
6
Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting.利用电子心理健康记录的自然语言处理进行住院法医精神病学环境中的风险预测。
J Biomed Inform. 2018 Oct;86:49-58. doi: 10.1016/j.jbi.2018.08.007. Epub 2018 Aug 14.
7
A nursing note-aware deep neural network for predicting mortality risk after hospital discharge.基于护理记录的深度学习神经网络预测出院后死亡率。
Int J Nurs Stud. 2024 Aug;156:104797. doi: 10.1016/j.ijnurstu.2024.104797. Epub 2024 May 9.
8
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。
J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.
9
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
10
Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model.基于具有领域自适应预训练的大型语言模型的中文临床命名实体识别的自动定量卒中严重程度评估。
Artif Intell Med. 2024 Apr;150:102822. doi: 10.1016/j.artmed.2024.102822. Epub 2024 Feb 27.

引用本文的文献

1
Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning.利用机器学习分析埃塞俄比亚医疗记录中的肺癌风险因素。
PLOS Digit Health. 2023 Jul 19;2(7):e0000308. doi: 10.1371/journal.pdig.0000308. eCollection 2023 Jul.

本文引用的文献

1
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。
Pac Symp Biocomput. 2020;25:295-306.
2
Predicting Mortality in the Surgical Intensive Care Unit Using Artificial Intelligence and Natural Language Processing of Physician Documentation.利用人工智能和医生记录的自然语言处理预测外科重症监护病房的死亡率
Am Surg. 2018 Jul 1;84(7):1190-1194.
3
Concurrence of big data analytics and healthcare: A systematic review.大数据分析与医疗保健的并存:系统评价。
Int J Med Inform. 2018 Jun;114:57-65. doi: 10.1016/j.ijmedinf.2018.03.013. Epub 2018 Mar 26.
4
Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay.纳入非结构化临床文本可提高对死亡或 ICU 住院时间延长的早期预测。
Crit Care Med. 2018 Jul;46(7):1125-1132. doi: 10.1097/CCM.0000000000003148.
5
Natural language processing of clinical notes for identification of critical limb ischemia.临床记录的自然语言处理以识别严重肢体缺血。
Int J Med Inform. 2018 Mar;111:83-89. doi: 10.1016/j.ijmedinf.2017.12.024. Epub 2017 Dec 28.
6
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
7
Semantic annotation in biomedicine: the current landscape.生物医学中的语义标注:现状
J Biomed Semantics. 2017 Sep 22;8(1):44. doi: 10.1186/s13326-017-0153-x.
8
Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.用于捕获和标准化非结构化临床信息的自然语言处理系统:一项系统综述。
J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.
9
Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.文本挖掘应用于电子心血管手术报告,以识别患有三叶瓣主动脉狭窄和冠状动脉疾病的患者。
J Biomed Inform. 2017 Aug;72:77-84. doi: 10.1016/j.jbi.2017.06.016. Epub 2017 Jun 15.
10
MetaMap Lite: an evaluation of a new Java implementation of MetaMap.MetaMap精简版:对MetaMap新Java实现的评估
J Am Med Inform Assoc. 2017 Jul 1;24(4):841-844. doi: 10.1093/jamia/ocw177.