Kropf Stefan, Krücken Peter, Mueller Wolf, Denecke Kerstin
Stefan Kropf, Innovation Center Computer Assisted Surgery, (ICCAS), Leipzig University, Semmelweisstraße 14, 04103 Leipzig, Germany, E-mail:
Methods Inf Med. 2017 May 18;56(3):230-237. doi: 10.3414/ME16-01-0073. Epub 2017 Feb 28.
Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results.
We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse.
Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML.
Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries.
Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.
临床信息通常以自由文本形式存储,例如出院小结或病理报告中。这些文档使用章节标题、编号列表、项目和分类字符串进行半结构化处理。然而,检索相关文档仍然具有挑战性,因为对完全非结构化文档应用关键词搜索会产生许多误报检索结果。
我们以病理报告的处理为例,专注于非结构化临床文档的处理。目标是将报告半自动转换为一种信息结构,以改进相关数据的访问和检索。数据预计将以标准化、结构化的方式存储,以便能够应用于文档特定部分的查询(章节敏感查询)以及信息重用。
我们的处理流程包括信息建模、章节边界检测和章节敏感查询。为了在非结构化数据中进行有针对性的搜索,文档会自动结构化并转换为通过openEHR原型指定的患者信息模型。生成的基于XML的病理电子健康记录(PEHR)通过XQuery进行查询,并通过XSLT在HTML中进行可视化。
通过基于关键词的方法,可以将病理报告(PR)可靠地结构化到各个章节中。使用openEHR进行信息建模可以在建模过程中节省时间,因为许多原型可以重复使用。生成的标准化、结构化PEHR允许通过检索与用户查询匹配的数据来访问相关数据。
将非结构化报告映射到标准化信息模型是更好地访问数据的实用解决方案。基于原型的XML通过成熟的XML技术实现了章节敏感检索和可视化。将检索重点放在特定章节有可能节省检索时间并提高检索准确性。