Suppr超能文献

一种针对各种笔记类型的双语临床文本的去识别方法。

A De-identification method for bilingual clinical texts of various note types.

作者信息

Shin Soo-Yong, Park Yu Rang, Shin Yongdon, Choi Hyo Joung, Park Jihyun, Lyu Yongman, Lee Moo-Song, Choi Chang-Min, Kim Woo-Sung, Lee Jae Ho

机构信息

Department of Biomedical Informatics, Asan Medical Center, Seoul, Korea. ; Office of Clinical Research Information, Asan Medical Center, Seoul, Korea.

Office of Clinical Research Information, Asan Medical Center, Seoul, Korea.

出版信息

J Korean Med Sci. 2015 Jan;30(1):7-15. doi: 10.3346/jkms.2015.30.1.7. Epub 2014 Dec 23.

Abstract

De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research.

摘要

为了无需患者书面知情同意,对个人健康信息进行去识别化至关重要。先前提出了使用自然语言处理技术来去除临床叙述文本中的标识符的去识别化方法,尽管这些方法仅专注于用英语书写的叙述文本。在本研究中,我们提出一种基于正则表达式的去识别化方法,用于处理用韩语和英语书写的双语临床记录。为了开发和验证正则表达式规则,我们分别获得了由20种类型的6039份临床记录和33种类型的5000份记录组成的训练和验证数据集。使用开发数据集构建了15条正则表达式规则,这些规则在验证数据集上的精确率达到99.87%,召回率达到96.25%。我们的去识别化方法成功地去除了各种类型的双语临床叙述文本中的标识符。因此,该方法将有助于医生更轻松地进行回顾性研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/193c/4278030/ff42e121b4a6/jkms-30-7-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验