• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于检测新加坡基层医疗电子病历中传染病症状的自然语言处理算法的验证

Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore.

作者信息

Hardjojo Antony, Gunachandran Arunan, Pang Long, Abdullah Mohammed Ridzwan Bin, Wah Win, Chong Joash Wen Chen, Goh Ee Hui, Teo Sok Huang, Lim Gilbert, Lee Mong Li, Hsu Wynne, Lee Vernon, Chen Mark I-Cheng, Wong Franco, Phang Jonathan Siung King

机构信息

Saw Swee Hock School of Public Health, National University Health System, National University of Singapore, Singapore, Singapore.

National Healthcare Group Polyclinics, Singapore, Singapore.

出版信息

JMIR Med Inform. 2018 Jun 11;6(2):e36. doi: 10.2196/medinform.8204.

DOI:10.2196/medinform.8204
PMID:29907560
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6026305/
Abstract

BACKGROUND

Free-text clinical records provide a source of information that complements traditional disease surveillance. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms.

OBJECTIVE

The aim of this study was to develop, train, and validate Clinical History Extractor for Syndromic Surveillance (CHESS), an natural language processing algorithm to extract clinical information from free-text primary care records.

METHODS

CHESS is a keyword-based natural language processing algorithm to extract 48 signs and symptoms suggesting respiratory infections, gastrointestinal infections, constitutional, as well as other signs and symptoms potentially associated with infectious diseases. The algorithm also captured the assertion status (affirmed, negated, or suspected) and symptom duration. Electronic medical records from the National Healthcare Group Polyclinics, a major public sector primary care provider in Singapore, were randomly extracted and manually reviewed by 2 human reviewers, with a third reviewer as the adjudicator. The algorithm was evaluated based on 1680 notes against the human-coded result as the reference standard, with half of the data used for training and the other half for validation.

RESULTS

The symptoms most commonly present within the 1680 clinical records at the episode level were those typically present in respiratory infections such as cough (744/7703, 9.66%), sore throat (591/7703, 7.67%), rhinorrhea (552/7703, 7.17%), and fever (928/7703, 12.04%). At the episode level, CHESS had an overall performance of 96.7% precision and 97.6% recall on the training dataset and 96.0% precision and 93.1% recall on the validation dataset. Symptoms suggesting respiratory and gastrointestinal infections were all detected with more than 90% precision and recall. CHESS correctly assigned the assertion status in 97.3%, 97.9%, and 89.8% of affirmed, negated, and suspected signs and symptoms, respectively (97.6% overall accuracy). Symptom episode duration was correctly identified in 81.2% of records with known duration status.

CONCLUSIONS

We have developed an natural language processing algorithm dubbed CHESS that achieves good performance in extracting signs and symptoms from primary care free-text clinical records. In addition to the presence of symptoms, our algorithm can also accurately distinguish affirmed, negated, and suspected assertion statuses and extract symptom durations.

摘要

背景

自由文本临床记录提供了补充传统疾病监测的信息来源。为了以电子方式利用这些记录,需要通过自然语言处理算法将它们转换为编码字段。

目的

本研究的目的是开发、训练和验证用于症状监测的临床病史提取器(CHESS),这是一种从自由文本初级保健记录中提取临床信息的自然语言处理算法。

方法

CHESS是一种基于关键字的自然语言处理算法,用于提取48种提示呼吸道感染、胃肠道感染、全身性症状以及其他可能与传染病相关的体征和症状。该算法还捕捉了断言状态(肯定、否定或疑似)和症状持续时间。从新加坡主要的公共部门初级保健提供者国家医疗集团综合诊所随机提取电子病历,并由两名人工审阅者进行人工审阅,第三名审阅者作为裁决者。以人工编码结果作为参考标准,基于1680份记录对该算法进行评估,其中一半数据用于训练,另一半用于验证。

结果

在1680份临床记录中,发作水平上最常见的症状是呼吸道感染中通常出现的症状,如咳嗽(744/7703,9.66%)、喉咙痛(591/7703,7.67%)、流鼻涕(552/7703,7.17%)和发烧(928/7703,12.04%)。在发作水平上,CHESS在训练数据集上的总体性能为精确率96.7%、召回率97.6%,在验证数据集上的精确率为96.0%、召回率为93.1%。提示呼吸道和胃肠道感染的症状检测精确率和召回率均超过90%。CHESS分别在97.3%、97.9%和89.8%的肯定、否定和疑似体征和症状中正确分配了断言状态(总体准确率97.6%)。在已知持续时间状态的记录中,81.2%的记录正确识别了症状发作持续时间。

结论

我们开发了一种名为CHESS的自然语言处理算法,该算法在从初级保健自由文本临床记录中提取体征和症状方面表现良好。除了症状的存在外,我们的算法还可以准确区分肯定、否定和疑似断言状态,并提取症状持续时间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/445ae9c92a86/medinform_v6i2e36_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/6f8e54b1f955/medinform_v6i2e36_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/1a36c1cef7df/medinform_v6i2e36_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/ee88406b658b/medinform_v6i2e36_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/c5cf9c309d3a/medinform_v6i2e36_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/ced27f4c131a/medinform_v6i2e36_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/445ae9c92a86/medinform_v6i2e36_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/6f8e54b1f955/medinform_v6i2e36_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/1a36c1cef7df/medinform_v6i2e36_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/ee88406b658b/medinform_v6i2e36_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/c5cf9c309d3a/medinform_v6i2e36_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/ced27f4c131a/medinform_v6i2e36_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8d/6026305/445ae9c92a86/medinform_v6i2e36_fig6.jpg

相似文献

1
Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore.用于检测新加坡基层医疗电子病历中传染病症状的自然语言处理算法的验证
JMIR Med Inform. 2018 Jun 11;6(2):e36. doi: 10.2196/medinform.8204.
2
Detecting the presence of an indwelling urinary catheter and urinary symptoms in hospitalized patients using natural language processing.使用自然语言处理技术检测住院患者体内留置导尿管的情况及泌尿系统症状。
J Biomed Inform. 2017 Jul;71S:S39-S45. doi: 10.1016/j.jbi.2016.07.012. Epub 2016 Jul 9.
3
Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.从临床记录中自动提取旅行史以用于传染病事件的检测:算法的开发和验证。
JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.
4
Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.用于宫颈癌和肛门癌及癌前病变监测的自然语言处理:算法开发与分割验证研究
JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826.
5
Natural Language Processing for Improved Characterization of COVID-19 Symptoms: Observational Study of 350,000 Patients in a Large Integrated Health Care System.自然语言处理改善 COVID-19 症状特征描述:大型综合医疗保健系统中 35 万名患者的观察性研究。
JMIR Public Health Surveill. 2022 Dec 30;8(12):e41529. doi: 10.2196/41529.
6
Development of a natural language processing algorithm to detect chronic cough in electronic health records.开发一种自然语言处理算法以检测电子健康记录中的慢性咳嗽。
BMC Pulm Med. 2022 Jun 28;22(1):256. doi: 10.1186/s12890-022-02035-6.
7
Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study.评估从多发性硬化症患者电子健康记录中确定扩展残疾状态量表评分的自然语言处理方法:算法开发与验证研究
JMIR Med Inform. 2022 Jan 12;10(1):e25157. doi: 10.2196/25157.
8
Regular expression-based learning to extract bodyweight values from clinical notes.基于正则表达式的学习方法,用于从临床记录中提取体重值。
J Biomed Inform. 2015 Apr;54:186-90. doi: 10.1016/j.jbi.2015.02.009. Epub 2015 Mar 5.
9
Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application.心脏病患者临床记录中伤口信息的自动识别:开发和验证一种自然语言处理应用程序。
Int J Nurs Stud. 2016 Dec;64:25-31. doi: 10.1016/j.ijnurstu.2016.09.013. Epub 2016 Sep 19.
10
Can natural language processing models extract and classify instances of interpersonal violence in mental healthcare electronic records: an applied evaluative study.自然语言处理模型能否从精神保健电子记录中提取和分类人际暴力实例:一项应用评估研究。
BMJ Open. 2022 Feb 16;12(2):e052911. doi: 10.1136/bmjopen-2021-052911.

引用本文的文献

1
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
2
Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.利用人工智能改善临床文档记录:一项系统综述。
Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.
3
Predicting COVID-19 Symptoms From Free Text in Medical Records Using Artificial Intelligence: Feasibility Study.使用人工智能从医疗记录中的自由文本预测新冠病毒疾病症状:可行性研究

本文引用的文献

1
Epidemiology and Relative Severity of Influenza Subtypes in Singapore in the Post-Pandemic Period from 2009 to 2010.2009 至 2010 年大流行后期新加坡流感亚型的流行病学和相对严重程度。
Clin Infect Dis. 2017 Nov 13;65(11):1905-1913. doi: 10.1093/cid/cix694.
2
An Ontology to Improve Transparency in Case Definition and Increase Case Finding of Infectious Intestinal Disease: Database Study in English General Practice.一种用于提高病例定义透明度并增加感染性肠道疾病病例发现率的本体:英国全科医疗数据库研究
JMIR Med Inform. 2017 Sep 28;5(3):e34. doi: 10.2196/medinform.7641.
3
Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records.
JMIR Med Inform. 2022 Apr 27;10(4):e37771. doi: 10.2196/37771.
4
A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study.一项使用自然语言处理评估放射性肺炎发展情况的半自动病历审查:诊断准确性和可行性研究
JMIR Med Inform. 2021 Nov 12;9(11):e29241. doi: 10.2196/29241.
5
Identifying Urinary Tract Infection-Related Information in Home Care Nursing Notes.在家庭护理记录中识别与尿路感染相关的信息。
J Am Med Dir Assoc. 2021 May;22(5):1015-1021.e2. doi: 10.1016/j.jamda.2020.12.010. Epub 2021 Jan 9.
6
A Collaborative Framework Based for Semantic Patients-Behavior Analysis and Highlight Topics Discovery of Alcoholic Beverages in Online Healthcare Forums.基于协作框架的语义患者行为分析及在线医疗保健论坛中酒类话题发现
J Med Syst. 2020 Apr 7;44(5):101. doi: 10.1007/s10916-020-01547-0.
一种改进的用于挖掘自由文本电子病历的计算机辅助技术的验证
JMIR Med Inform. 2017 Jun 29;5(2):e17. doi: 10.2196/medinform.7123.
4
Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis.新加坡寨卡病毒感染疫情的流行病学、昆虫学、病毒学和临床分析。
Lancet Infect Dis. 2017 Aug;17(8):813-821. doi: 10.1016/S1473-3099(17)30249-9. Epub 2017 May 17.
5
Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.使用在线机器学习和受控词汇表的异构临床报告有效信息提取框架
JMIR Med Inform. 2017 May 9;5(2):e12. doi: 10.2196/medinform.7235.
6
Evaluating topic model interpretability from a primary care physician perspective.从初级保健医生的角度评估主题模型的可解释性。
Comput Methods Programs Biomed. 2016 Feb;124:67-75. doi: 10.1016/j.cmpb.2015.10.014. Epub 2015 Oct 30.
7
Identifying influenza-like illness presentation from unstructured general practice clinical narrative using a text classifier rule-based expert system versus a clinical expert.使用基于文本分类器的规则专家系统与临床专家,从非结构化的全科医疗临床记录中识别流感样疾病表现。
BMC Med Inform Decis Mak. 2015 Oct 6;15:78. doi: 10.1186/s12911-015-0201-3.
8
An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating.自然语言处理简介:如何从你正在生成的电子笔记中获取更多信息。
Pediatr Emerg Care. 2015 Jul;31(7):536-41. doi: 10.1097/PEC.0000000000000484.
9
Performance of case definitions for influenza surveillance.流感监测病例定义的性能。
Euro Surveill. 2015 Jun 4;20(22):21145. doi: 10.2807/1560-7917.es2015.20.22.21145.
10
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.电子健康记录语料库中的冗余:分析、对文本挖掘性能的影响和缓解策略。
BMC Bioinformatics. 2013 Jan 16;14:10. doi: 10.1186/1471-2105-14-10.