• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种改进的用于挖掘自由文本电子病历的计算机辅助技术的验证

Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records.

作者信息

Duz Marco, Marshall John F, Parkin Tim

机构信息

School of Veterinary Medicine and Science, University of Nottingham, Loughborough, United Kingdom.

School of Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

出版信息

JMIR Med Inform. 2017 Jun 29;5(2):e17. doi: 10.2196/medinform.7123.

DOI:10.2196/medinform.7123
PMID:28663163
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5509949/
Abstract

BACKGROUND

The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used.

OBJECTIVE

The aim of this study was to validate a computer-assisted technique using commercially available content analysis software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text EMRs.

METHODS

The dataset used for the validation process included life-long EMRs from 335 patients (17,563 rows of data), selected at random from a larger dataset (141,543 patients, ~2.6 million rows of data) and obtained from 10 equine veterinary practices in the United Kingdom. The ability of the computer-assisted technique to detect rows of data (cases) of colic, renal failure, right dorsal colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population was compared with manual classification. The first step of the computer-assisted analysis process was the definition of inclusion dictionaries to identify cases, including terms identifying a condition of interest. Words in inclusion dictionaries were selected from the list of all words in the dataset obtained in SS/WS. The second step consisted of defining an exclusion dictionary, including combinations of words to remove cases erroneously classified by the inclusion dictionary alone. The third step was the definition of a reinclusion dictionary to reinclude cases that had been erroneously classified by the exclusion dictionary. Finally, cases obtained by the exclusion dictionary were removed from cases obtained by the inclusion dictionary, and cases from the reinclusion dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for Statistical Computing, Vienna, Austria). Manual analysis was performed as a separate process by a single experienced clinician reading through the dataset once and classifying each row of data based on the interpretation of the free-text notes. Validation was performed by comparison of the computer-assisted method with manual analysis, which was used as the gold standard. Sensitivity, specificity, negative predictive values (NPVs), positive predictive values (PPVs), and F values of the computer-assisted process were calculated by comparing them with the manual classification.

RESULTS

Lowest sensitivity, specificity, PPVs, NPVs, and F values were 99.82% (1128/1130), 99.88% (16410/16429), 94.6% (223/239), 100.00% (16410/16412), and 99.0% (100×2×0.983×0.998/[0.983+0.998]), respectively. The computer-assisted process required few seconds to run, although an estimated 30 h were required for dictionary creation. Manual classification required approximately 80 man-hours.

CONCLUSIONS

The critical step in this work is the creation of accurate and inclusive dictionaries to ensure that no potential cases are missed. It is significantly easier to remove false positive terms from a SS/WS selected subset of a large database than search that original database for potential false negatives. The benefits of using this method are proportional to the size of the dataset to be analyzed.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51fb/5509949/f5e847b23e4b/medinform_v5i2e17_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51fb/5509949/31c60e5dc94c/medinform_v5i2e17_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51fb/5509949/f5e847b23e4b/medinform_v5i2e17_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51fb/5509949/31c60e5dc94c/medinform_v5i2e17_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51fb/5509949/f5e847b23e4b/medinform_v5i2e17_fig2.jpg
摘要

背景

电子病历(EMR)的使用为临床流行病学研究提供了机会。对于大型电子病历数据库,自动化分析流程是必要的,但在常规使用之前需要进行全面验证。

目的

本研究旨在验证一种使用商用内容分析软件(SimStat-WordStat v.6(SS/WS),Provalis Research)挖掘自由文本电子病历的计算机辅助技术。

方法

用于验证过程的数据集包括从更大的数据集(141,543例患者,约260万行数据)中随机选取的335例患者的终身电子病历(17,563行数据),这些数据来自英国的10家马兽医诊所。将计算机辅助技术在人群中检测腹痛、肾衰竭、右背结肠炎和非甾体抗炎药(NSAID)使用的数据行(病例)的能力与人工分类进行比较。计算机辅助分析过程的第一步是定义包含词典以识别病例,包括识别感兴趣病症的术语。包含词典中的单词从SS/WS中获得的数据集中所有单词的列表中选择。第二步包括定义排除词典,包括用于去除仅由包含词典错误分类的病例的单词组合。第三步是定义重新纳入词典,以重新纳入被排除词典错误分类的病例。最后,从包含词典获得的病例中去除排除词典获得的病例,并使用Rv3.0.2(R统计计算基金会,奥地利维也纳)将重新纳入词典中的病例随后重新纳入。人工分析由一位经验丰富的临床医生单独进行,该医生通读数据集一次,并根据对自由文本注释的解释对每行数据进行分类。通过将计算机辅助方法与作为金标准的人工分析进行比较来进行验证。通过将计算机辅助过程与人工分类进行比较,计算其敏感性、特异性、阴性预测值(NPV)、阳性预测值(PPV)和F值。

结果

最低敏感性、特异性、PPV、NPV和F值分别为99.82%(1128/1130)、99.88%(16410/16429)、94.6%(223/239)、100.00%(16410/16412)和99.0%(100×2×0.983×0.998/[0.983 + 0.998])。计算机辅助过程运行只需几秒钟,尽管创建词典估计需要30小时。人工分类大约需要80人时。

结论

这项工作的关键步骤是创建准确且全面的词典,以确保不会遗漏任何潜在病例。从大型数据库的SS/WS选择子集中去除假阳性术语比在原始数据库中搜索潜在假阴性要容易得多。使用此方法的好处与要分析的数据集大小成正比。

相似文献

1
Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records.一种改进的用于挖掘自由文本电子病历的计算机辅助技术的验证
JMIR Med Inform. 2017 Jun 29;5(2):e17. doi: 10.2196/medinform.7123.
2
Validation of text-mining and content analysis techniques using data collected from veterinary practice management software systems in the UK.使用从英国兽医实践管理软件系统收集的数据对文本挖掘和内容分析技术进行验证。
Prev Vet Med. 2019 Jun 1;167:61-67. doi: 10.1016/j.prevetmed.2019.02.015. Epub 2019 Mar 14.
3
Mining free-text medical records for companion animal enteric syndrome surveillance.从非结构化的医疗记录中挖掘伴侣动物肠综合征监测数据。
Prev Vet Med. 2014 Mar 1;113(4):417-22. doi: 10.1016/j.prevetmed.2014.01.017. Epub 2014 Jan 20.
4
A rule-based electronic phenotyping algorithm for detecting clinically relevant cardiovascular disease cases.一种用于检测临床相关心血管疾病病例的基于规则的电子表型分析算法。
BMC Res Notes. 2017 Jul 14;10(1):281. doi: 10.1186/s13104-017-2600-2.
5
Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting.利用电子心理健康记录的自然语言处理进行住院法医精神病学环境中的风险预测。
J Biomed Inform. 2018 Oct;86:49-58. doi: 10.1016/j.jbi.2018.08.007. Epub 2018 Aug 14.
6
Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text.丹麦临床叙事文本中的词典构建和潜在药物不良事件的识别。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):947-53. doi: 10.1136/amiajnl-2013-001708. Epub 2013 May 23.
7
Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data.利用现有现成方法实现更好的公共卫生报告:医学词典在使用纯文本医学数据进行自动癌症检测中的价值。
J Biomed Inform. 2017 May;69:160-176. doi: 10.1016/j.jbi.2017.04.008. Epub 2017 Apr 12.
8
Toward better public health reporting using existing off the shelf approaches: A comparison of alternative cancer detection approaches using plaintext medical data and non-dictionary based feature selection.利用现有现成方法实现更好的公共卫生报告:使用纯文本医疗数据和基于非字典特征选择的替代癌症检测方法的比较
J Biomed Inform. 2016 Apr;60:145-52. doi: 10.1016/j.jbi.2016.01.008. Epub 2016 Jan 28.
9
A dictionary learning approach for human sperm heads classification.字典学习方法在人类精子头分类中的应用。
Comput Biol Med. 2017 Dec 1;91:181-190. doi: 10.1016/j.compbiomed.2017.10.009. Epub 2017 Oct 10.
10
Evaluating the Applicability of Existing Lexicon-Based Sentiment Analysis Techniques on Family Medicine Resident Feedback Field Notes: Retrospective Cohort Study.评估现有基于词典的情感分析技术在家庭医学住院医师反馈现场记录中的适用性:回顾性队列研究。
JMIR Med Educ. 2023 Jul 27;9:e41953. doi: 10.2196/41953.

引用本文的文献

1
Predicting COVID-19 Symptoms From Free Text in Medical Records Using Artificial Intelligence: Feasibility Study.使用人工智能从医疗记录中的自由文本预测新冠病毒疾病症状:可行性研究
JMIR Med Inform. 2022 Apr 27;10(4):e37771. doi: 10.2196/37771.
2
A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study.一项使用自然语言处理评估放射性肺炎发展情况的半自动病历审查:诊断准确性和可行性研究
JMIR Med Inform. 2021 Nov 12;9(11):e29241. doi: 10.2196/29241.
3
Using Electronic Medical Record Data for Research in a Healthcare Information and Management Systems Society (HIMSS) Analytics Electronic Medical Record Adoption Model (EMRAM) Stage 7 Hospital in Beijing: Cross-sectional Study.

本文引用的文献

1
Extracting information from the text of electronic medical records to improve case detection: a systematic review.从电子病历文本中提取信息以改善病例检测:一项系统综述
J Am Med Inform Assoc. 2016 Sep;23(5):1007-15. doi: 10.1093/jamia/ocv180. Epub 2016 Feb 5.
2
Mining free-text medical records for companion animal enteric syndrome surveillance.从非结构化的医疗记录中挖掘伴侣动物肠综合征监测数据。
Prev Vet Med. 2014 Mar 1;113(4):417-22. doi: 10.1016/j.prevetmed.2014.01.017. Epub 2014 Jan 20.
3
Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.
在北京一家处于医疗保健信息与管理系统协会(HIMSS)分析电子病历采用模型(EMRAM)7级的医院中,利用电子病历数据进行研究:横断面研究。
JMIR Med Inform. 2021 Aug 3;9(8):e24405. doi: 10.2196/24405.
4
Open Agile text mining for bioinformatics: the PubAnnotation ecosystem.开放的生物信息学敏捷文本挖掘:PubAnnotation 生态系统。
Bioinformatics. 2019 Nov 1;35(21):4372-4380. doi: 10.1093/bioinformatics/btz227.
5
Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore.用于检测新加坡基层医疗电子病历中传染病症状的自然语言处理算法的验证
JMIR Med Inform. 2018 Jun 11;6(2):e36. doi: 10.2196/medinform.8204.
6
A Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis.一项使用基于注意力的深度神经阅读器进行生物医学文本理解的初步研究:设计与实验分析。
JMIR Med Inform. 2018 Jan 5;6(1):e2. doi: 10.2196/medinform.8751.
用于多中心研究的电子健康记录数据去识别和匿名化策略。
Med Care. 2012 Jul;50 Suppl(Suppl):S82-101. doi: 10.1097/MLR.0b013e3182585355.
4
Linking genes to literature: text mining, information extraction, and retrieval applications for biology.将基因与文献相联系:生物学的文本挖掘、信息提取及检索应用
Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.
5
Extracting information from textual documents in the electronic health record: a review of recent research.从电子健康记录中的文本文件提取信息:近期研究综述
Yearb Med Inform. 2008:128-44.
6
Use of free text clinical records in identifying syndromes and analysing health data.利用自由文本临床记录识别综合征和分析健康数据。
Vet Rec. 2007 Oct 20;161(16):547-51. doi: 10.1136/vr.161.16.547.
7
Mining free-text medical records.挖掘自由文本医疗记录。
Proc AMIA Symp. 2001:254-8.
8
Narrative based medicine: narrative based medicine in an evidence based world.基于叙事的医学:循证世界中的基于叙事的医学。
BMJ. 1999 Jan 30;318(7179):323-5. doi: 10.1136/bmj.318.7179.323.