Lohr Christina, Luther Stephanie, Matthies Franz, Modersohn Luise, Ammon Danny, Saleh Kutaiba, Henkel Andreas G, Kiehntopf Michael, Hahn Udo
Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
Data Integration Center, IT Business Division, Jena University Hospital.
AMIA Annu Symp Proc. 2018 Dec 5;2018:770-779. eCollection 2018.
We present the outcome of an annotation effort targeting the content-sensitive segmentation of German clinical reports into sections. We recruited an annotation team of up to eight medical students to annotate a clinical text corpus on a sentence-by-sentence basis in four pre-annotation iterations and one final main annotation step. The annotation scheme we came up with adheres to categories developed for clinical documents in the HL7-CDA (Clinical Document Architecture) standard for section headings. Once the scheme became stable, we ran the main annotation campaign on the complete set of roughly 1,000 clinical documents. Due to its reliance on the CDA standard, the annotation scheme allows the integration of legacy and newly produced clinical documents within a common pipeline. We then made direct use of the annotations by training a baseline classifier to automatically identify sections in clinical reports.
我们展示了一项注释工作的成果,该工作旨在将德语临床报告按内容敏感地分割成各个部分。我们招募了一个最多由八名医学生组成的注释团队,在四个预注释迭代和一个最终主要注释步骤中,逐句注释一个临床文本语料库。我们提出的注释方案遵循了HL7-CDA(临床文档架构)标准中为临床文档章节标题制定的类别。一旦该方案稳定下来,我们就在大约1000份完整的临床文档集上开展了主要注释活动。由于其依赖CDA标准,该注释方案允许在一个通用流程中整合遗留临床文档和新生成的临床文档。然后,我们通过训练一个基线分类器来直接利用这些注释,以自动识别临床报告中的各个部分。