使用瑞典语和丹麦语资源对挪威临床文本进行去识别化处理

De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.

机构信息

Norwegian Centre for E-health Research, Tromsø, Norway.

Department of Informatics, Bioengineering, Robotics and System engineering (DIBRIS), University of Genoa, Genoa, Italy.

出版信息

AMIA Annu Symp Proc. 2024 Jan 11;2023:456-464. eCollection 2023.

PMID:38222432

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10785939/

Abstract

The lack of relevant annotated datasets represents one key limitation in the application of Natural Language Processing techniques in a broad number of tasks, among them Protected Health Information (PHI) identification in Norwegian clinical text. In this work, the possibility of exploiting resources from Swedish, a very closely related language, to Norwegian is explored. The Swedish dataset is annotated with PHI information. Different processing and text augmentation techniques are evaluated, along with their impact in the final performance of the model. The augmentation techniques, such as injection and generation of both Norwegian and Scandinavian Named Entities into the Swedish training corpus, showed to increase the performance in the de-identification task for both Danish and Norwegian text. This trend was also confirmed by the evaluation of model performance on a sample Norwegian gastro surgical clinical text.

摘要

缺乏相关的标注数据集是自然语言处理技术在许多任务中应用的一个关键限制，其中包括在挪威临床文本中识别受保护的健康信息 (PHI)。在这项工作中，探索了利用瑞典语资源的可能性，瑞典语与挪威语非常相似。瑞典语数据集使用 PHI 信息进行了标注。评估了不同的处理和文本扩充技术，以及它们对模型最终性能的影响。扩充技术，如将挪威语和斯堪的纳维亚语命名实体注入和生成到瑞典语训练语料库中，显示出对丹麦语和挪威语文本的去识别任务性能的提高。这种趋势也通过对挪威胃肠外科临床文本样本的模型性能评估得到了证实。

相似文献

De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.

AMIA Annu Symp Proc. 2024 Jan 11;2023:456-464. eCollection 2023.

Prevalence Estimation of Protected Health Information in Swedish Clinical Text.

Stud Health Technol Inform. 2017;235:216-220.

De-Identifying Swedish EHR Text Using Public Resources in the General Domain.

Stud Health Technol Inform. 2020 Jun 16;270:148-152. doi: 10.3233/SHTI200140.

Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.

Int J Med Inform. 2009 Dec;78(12):e19-26. doi: 10.1016/j.ijmedinf.2009.04.005. Epub 2009 May 23.

Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.

J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.

De-identifying free text of Japanese electronic health records.

J Biomed Semantics. 2020 Sep 21;11(1):11. doi: 10.1186/s13326-020-00227-9.

A machine learning based approach to identify protected health information in Chinese clinical text.

Int J Med Inform. 2018 Aug;116:24-32. doi: 10.1016/j.ijmedinf.2018.05.010. Epub 2018 May 22.

Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

J Biomed Inform. 2014 Aug;50:173-183. doi: 10.1016/j.jbi.2014.01.014. Epub 2014 Feb 17.

Detecting Protected Health Information in Heterogeneous Clinical Notes.

Stud Health Technol Inform. 2017;245:393-397.

Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.

J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.

引用本文的文献

Domain-Specific Pretraining of NorDeClin-Bidirectional Encoder Representations From Transformers for Code Prediction in Norwegian Clinical Texts: Model Development and Evaluation Study.

JMIR AI. 2025 Aug 25;4:e66153. doi: 10.2196/66153.

Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.

Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8.

Automated redaction of names in adverse event reports using transformer-based neural networks.

BMC Med Inform Decis Mak. 2024 Dec 23;24(1):401. doi: 10.1186/s12911-024-02785-9.

本文引用的文献

Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set.

Appl Soft Comput. 2020 Dec;97:106779. doi: 10.1016/j.asoc.2020.106779. Epub 2020 Oct 9.

Natural language processing of German clinical colorectal cancer notes for guideline-based treatment evaluation.

Int J Med Inform. 2019 Jul;127:141-146. doi: 10.1016/j.ijmedinf.2019.04.022. Epub 2019 Apr 25.

De-identification of clinical notes via recurrent neural network and conditional random field.

J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.

De-identification of patient notes with recurrent neural networks.

J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.

A De-identification method for bilingual clinical texts of various note types.

J Korean Med Sci. 2015 Jan;30(1):7-15. doi: 10.3346/jkms.2015.30.1.7. Epub 2014 Dec 23.

Automatic de-identification of textual documents in the electronic health record: a review of recent research.

BMC Med Res Methodol. 2010 Aug 2;10:70. doi: 10.1186/1471-2288-10-70.

De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields.

J Biomed Semantics. 2010 Apr 12;1(1):6. doi: 10.1186/2041-1480-1-6.

Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.

Int J Med Inform. 2009 Dec;78(12):e19-26. doi: 10.1016/j.ijmedinf.2009.04.005. Epub 2009 May 23.

Evaluating the state-of-the-art in automatic de-identification.

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用瑞典语和丹麦语资源对挪威临床文本进行去识别化处理

De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献