Suppr超能文献

使用瑞典语和丹麦语资源对挪威临床文本进行去识别化处理

De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.

机构信息

Norwegian Centre for E-health Research, Tromsø, Norway.

Department of Informatics, Bioengineering, Robotics and System engineering (DIBRIS), University of Genoa, Genoa, Italy.

出版信息

AMIA Annu Symp Proc. 2024 Jan 11;2023:456-464. eCollection 2023.

Abstract

The lack of relevant annotated datasets represents one key limitation in the application of Natural Language Processing techniques in a broad number of tasks, among them Protected Health Information (PHI) identification in Norwegian clinical text. In this work, the possibility of exploiting resources from Swedish, a very closely related language, to Norwegian is explored. The Swedish dataset is annotated with PHI information. Different processing and text augmentation techniques are evaluated, along with their impact in the final performance of the model. The augmentation techniques, such as injection and generation of both Norwegian and Scandinavian Named Entities into the Swedish training corpus, showed to increase the performance in the de-identification task for both Danish and Norwegian text. This trend was also confirmed by the evaluation of model performance on a sample Norwegian gastro surgical clinical text.

摘要

缺乏相关的标注数据集是自然语言处理技术在许多任务中应用的一个关键限制,其中包括在挪威临床文本中识别受保护的健康信息 (PHI)。在这项工作中,探索了利用瑞典语资源的可能性,瑞典语与挪威语非常相似。瑞典语数据集使用 PHI 信息进行了标注。评估了不同的处理和文本扩充技术,以及它们对模型最终性能的影响。扩充技术,如将挪威语和斯堪的纳维亚语命名实体注入和生成到瑞典语训练语料库中,显示出对丹麦语和挪威语文本的去识别任务性能的提高。这种趋势也通过对挪威胃肠外科临床文本样本的模型性能评估得到了证实。

相似文献

1
De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.
AMIA Annu Symp Proc. 2024 Jan 11;2023:456-464. eCollection 2023.
3
De-Identifying Swedish EHR Text Using Public Resources in the General Domain.
Stud Health Technol Inform. 2020 Jun 16;270:148-152. doi: 10.3233/SHTI200140.
6
De-identifying free text of Japanese electronic health records.
J Biomed Semantics. 2020 Sep 21;11(1):11. doi: 10.1186/s13326-020-00227-9.
7
A machine learning based approach to identify protected health information in Chinese clinical text.
Int J Med Inform. 2018 Aug;116:24-32. doi: 10.1016/j.ijmedinf.2018.05.010. Epub 2018 May 22.
8
Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.
J Biomed Inform. 2014 Aug;50:173-183. doi: 10.1016/j.jbi.2014.01.014. Epub 2014 Feb 17.

本文引用的文献

1
Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set.
Appl Soft Comput. 2020 Dec;97:106779. doi: 10.1016/j.asoc.2020.106779. Epub 2020 Oct 9.
2
Natural language processing of German clinical colorectal cancer notes for guideline-based treatment evaluation.
Int J Med Inform. 2019 Jul;127:141-146. doi: 10.1016/j.ijmedinf.2019.04.022. Epub 2019 Apr 25.
3
De-identification of clinical notes via recurrent neural network and conditional random field.
J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.
4
De-identification of patient notes with recurrent neural networks.
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
5
A De-identification method for bilingual clinical texts of various note types.
J Korean Med Sci. 2015 Jan;30(1):7-15. doi: 10.3346/jkms.2015.30.1.7. Epub 2014 Dec 23.
9
Evaluating the state-of-the-art in automatic de-identification.
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验