Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
Medizinische Informatik, TU München, München, Germany.
Stud Health Technol Inform. 2024 Jan 25;310:599-603. doi: 10.3233/SHTI231035.
We here report on one of the outcomes of a large-scale German research program, the Medical Informatics Initiative (MII), aiming at the development of a solid data and software infrastructure for German-language clinical natural language processing. Within this framework, we have developed 3000PA, a national clinical reference corpus composed of patient records from three clinical university sites and annotated with a multitude of semantic annotation layers (including medical named entities, semantic and temporal relations between entities, as well as certainty and negation information related to entities and relations). This non-sharable corpus has been complemented by three sharable ones (JSYNCC, GGPONC, and GRASCCO). Overall, 3000PA, JSYNCC and GRASCCO feature about 2.1 million metadata points.
我们在此报告德国大型研究计划——医学信息学倡议(MII)的成果之一,该计划旨在为德语临床自然语言处理开发坚实的数据和软件基础。在此框架内,我们开发了 3000PA,这是一个由来自三个临床大学的患者记录组成的国家临床参考语料库,并使用多种语义注释层进行了注释(包括医学命名实体、实体之间的语义和时间关系,以及与实体和关系相关的确定性和否定信息)。这个不可共享的语料库由三个可共享的语料库(JSYNCC、GGPONC 和 GRASCCO)进行了补充。总体而言,3000PA、JSYNCC 和 GRASCCO 包含约 210 万条元数据。