• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

[一种从电子病历非结构化文本数据中提取信息的定制方法]

[A customized method for information extraction from unstructured text data in the electronic medical records].

作者信息

Bao X Y, Huang W J, Zhang K, Jin M, Li Y, Niu C Z

机构信息

Medical Informatics Center, Peking University, Beijing 100191, China; National Clinical Service Data Center, Beijing 100191, China.

School of Mathematical Sciences, Peking University, Beijing 100871, China.

出版信息

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

PMID:29643524
Abstract

OBJECTIVE

There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing.

METHODS

Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method.

RESULTS

For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94.

CONCLUSION

This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.

摘要

目的

电子病历(EMR)中存在海量的诊断或治疗信息,是临床医生实际诊疗细节的具体体现。电子病历中的大量片段,如主诉、现病史、既往史、鉴别诊断、诊断影像学、手术记录等,反映了临床过程中的诊疗细节,采用自然语言中文描述。如何从这些中文叙述文本数据中提取有效信息,并将其整理成表格形式用于医学研究分析,以实现临床数据在现实世界中的实际应用,是中文医学数据处理中的难题。

方法

基于我国一家三级医院的电子病历叙述文本数据,提出一种定制化信息提取规则学习及基于规则的信息提取方法。总体方法包括三个步骤:(1)步骤一,随机抽取600份电子病历数据样本(包括现病史、既往史、个人史、家族史等)作为原始语料库。借助我们开发的中文临床叙述文本标注平台,训练有素的临床医生和护士对语料库中要提取的词元和短语进行标注(以糖尿病病史为例)。(2)步骤二,基于标注后的语料库临床文本数据,首先总结归纳一些提取模板。然后使用Perl编程语言的正则表达式对这些模板进行改写,作为提取规则。以这些提取规则作为基础知识库,我们用Perl开发提取包,从电子病历文本数据中提取数据。最后,将提取的数据项整理成表格数据格式,供后续临床研究或医院监测使用。(3)作为该方法的最后一步,在国家临床服务数据集成平台上对所提方法进行评估和验证,我们采用人工验证和自动验证相结合的方式检查提取结果,证明了该方法的有效性。

结果

对于该医院内分泌科确诊为糖尿病的所有患者,这些患者的病史片段显示,2015年共有1436例患者出院,糖尿病病史病历提取结果显示召回率为87.6%,准确率为99.5%,F值为0.93。对于同一科室截至2017年8月出院日期的所有10%的糖尿病患者(共1223例),提取的糖尿病病史提取结果显示召回率为89.2%,准确率为99.2%,F值为0.94。

结论

本研究主要采用自然语言处理与基于规则的信息提取相结合的方法,设计并实现了一种从非结构化中文电子病历文本数据中提取定制化信息的算法。其效果优于现有工作。

相似文献

1
[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.
2
Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.用于改进基于规则的信息抽取自然语言处理管道的规则可读性的编程技术,这些管道处理非结构化和半结构化的医学文本。
Health Informatics J. 2023 Apr-Jun;29(2):14604582231164696. doi: 10.1177/14604582231164696.
3
Using natural language processing to extract clinically useful information from Chinese electronic medical records.利用自然语言处理从中文电子病历中提取有临床价值的信息。
Int J Med Inform. 2019 Apr;124:6-12. doi: 10.1016/j.ijmedinf.2019.01.004. Epub 2019 Jan 7.
4
Extracting information from the text of electronic medical records to improve case detection: a systematic review.从电子病历文本中提取信息以改善病例检测:一项系统综述
J Am Med Inform Assoc. 2016 Sep;23(5):1007-15. doi: 10.1093/jamia/ocv180. Epub 2016 Feb 5.
5
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
6
A Text Structuring Method for Chinese Medical Text Based on Temporal Information.基于时间信息的中文医学文本结构方法。
Int J Environ Res Public Health. 2018 Feb 27;15(3):402. doi: 10.3390/ijerph15030402.
7
Extracting Structured Genotype Information from Free-Text HLA Reports Using a Rule-Based Approach.基于规则的方法从 HLA 报告的自由文本中提取结构化基因型信息。
J Korean Med Sci. 2020 Mar 30;35(12):e78. doi: 10.3346/jkms.2020.35.e78.
8
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
9
RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records.RegEMR:一个自然语言处理系统,用于从中文电子病历中自动识别卵巢早衰。
BMC Med Inform Decis Mak. 2023 Jul 18;23(1):126. doi: 10.1186/s12911-023-02239-8.
10
Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system.利用自然语言处理从非结构化临床信件中提取结构化癫痫数据:ExECT(癫痫临床文本提取)系统的开发和验证。
BMJ Open. 2019 Apr 1;9(4):e023232. doi: 10.1136/bmjopen-2018-023232.

引用本文的文献

1
Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models.开发用于医院获得性压力性损伤的住院电子病历表型:使用自然语言处理模型的案例研究
JMIR AI. 2023 Mar 8;2:e41264. doi: 10.2196/41264.
2
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.基于中医临床记录构建细粒度实体识别语料库。
BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2.