Suppr超能文献

一个用于医学实体识别和匿名化的西班牙语和加泰罗尼亚语去识别化健康记录文本数据集。

A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization.

作者信息

Lima-López Salvador, Farré-Maduell Eulàlia, Gasco Luis, Rodríguez-Miret Jan, Frid Santiago, Pastor Xavier, Borrat Xavier, Krallinger Martin

机构信息

NLP for Biomedical Information Analysis Unit, Barcelona Supercomputing Center, Barcelona, 08034, Spain.

Clinical Informatics, Hospital Clinic, Barcelona, 08036, Spain.

出版信息

Sci Data. 2025 Jul 1;12(1):1088. doi: 10.1038/s41597-025-05320-1.

Abstract

The advancement of clinical natural language processing systems is crucial to exploit the wealth of textual data contained in medical records. Diverse data sources are required in different languages and from different sites to represent global health services. To this end, we have released CARMEN-I, a corpus of anonymized clinical records from the Hospital Clinic of Barcelona written during the COVID-19 pandemic spanning a period of two years. In addition to COVID-19 cases of adult patients, CARMEN-I features multiple comorbidities such as cardiovascular conditions, oncology treatments, post-transplant complications, and infectious diseases. This resource is publicly accessible together with detailed annotation guidelines and granular text-bound annotations generated in a collaborative effort between clinicians, linguists, and engineers to enable training and evaluation of automatic anonymization systems. Moreover, for information extraction purposes, a subset of 500 records is annotated with six relevant clinical concept classes: diseases, symptoms, procedures, medications, pathogens and humans.

摘要

临床自然语言处理系统的发展对于利用病历中丰富的文本数据至关重要。为了代表全球卫生服务,需要来自不同语言和不同地点的多样数据源。为此,我们发布了CARMEN-I,这是一个来自巴塞罗那医院诊所的匿名临床记录语料库,记录时间跨越两年的新冠疫情期间。除了成年患者的新冠病例外,CARMEN-I还包含多种合并症,如心血管疾病、肿瘤治疗、移植后并发症和传染病。该资源可公开获取,同时还提供详细的注释指南以及临床医生、语言学家和工程师共同协作生成的细粒度文本绑定注释,以支持自动匿名化系统的训练和评估。此外,为了信息提取的目的,对500条记录的子集进行了六种相关临床概念类别的注释:疾病、症状、程序、药物、病原体和人类。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验