Zvára Karel, Tomečková Marie, Peleška Jan, Svátek Vojtěch, Zvárová Jana
Prof. Jana Zvárová, Ph.D., DSc., FEFMI, Institute of Hygiene and Epidemiology, 1st Faculty of Medicine, Charles University, Studnickova 7, 128 00 Prague 2, Czech Republic, E-mail:
Methods Inf Med. 2017 May 18;56(3):217-229. doi: 10.3414/ME16-01-0083. Epub 2017 Apr 28.
Our main objective is to design a method of, and supporting software for, interactive correction and semantic annotation of narrative clinical reports, which would allow for their easier and less erroneous processing outside their original context: first, by physicians unfamiliar with the original language (and possibly also the source specialty), and second, by tools requiring structured information, such as decision-support systems. Our additional goal is to gain insights into the process of narrative report creation, including the errors and ambiguities arising therein, and also into the process of report annotation by clinical terms. Finally, we also aim to provide a dataset of ground-truth transformations (specific for Czech as the source language), set up by expert physicians, which can be reused in the future for subsequent analytical studies and for training automated transformation procedures.
A three-phase preprocessing method has been developed to support secondary use of narrative clinical reports in electronic health record. Narrative clinical reports are narrative texts of healthcare documentation often stored in electronic health records. In the first phase a narrative clinical report is tokenized. In the second phase the tokenized clinical report is normalized. The normalized clinical report is easily readable for health professionals with the knowledge of the language used in the narrative clinical report. In the third phase the normalized clinical report is enriched with extracted structured information. The final result of the third phase is a semi-structured normalized clinical report where the extracted clinical terms are matched to codebook terms. Software tools for interactive correction, expansion and semantic annotation of narrative clinical reports has been developed and the three-phase preprocessing method validated in the cardiology area.
The three-phase preprocessing method was validated on 49 anonymous Czech narrative clinical reports in the field of cardiology. Descriptive statistics from the database of accomplished transformations has been calculated. Two cardiologists participated in the annotation phase. The first cardiologist annotated 1500 clinical terms found in 49 narrative clinical reports to codebook terms using the classification systems ICD 10, SNOMED CT, LOINC and LEKY. The second cardiologist validated annotations of the first cardiologist. The correct clinical terms and the codebook terms have been stored in a database.
We extracted structured information from Czech narrative clinical reports by the proposed three-phase preprocessing method and linked it to electronic health records. The software tool, although generic, is tailored for Czech as the specific language of electronic health record pool under study. This will provide a potential etalon for porting this approach to dozens of other less-spoken languages. Structured information can support medical decision making, quality assurance tasks and further medical research.
我们的主要目标是设计一种用于叙述性临床报告交互式校正和语义标注的方法及支持软件,这将使这些报告在其原始语境之外更易于处理且减少错误:首先,供不熟悉原始语言(可能也不熟悉源专业)的医生使用;其次,供需要结构化信息的工具(如决策支持系统)使用。我们的额外目标是深入了解叙述性报告的创建过程,包括其中出现的错误和歧义,以及用临床术语进行报告标注的过程。最后,我们还旨在提供一个由专家医生建立的真实转换数据集(特定于捷克语作为源语言),可在未来用于后续分析研究和训练自动转换程序。
已开发出一种三阶段预处理方法,以支持电子健康记录中叙述性临床报告的二次使用。叙述性临床报告是医疗文档中的叙述性文本,通常存储在电子健康记录中。在第一阶段,对叙述性临床报告进行分词。在第二阶段,对分词后的临床报告进行规范化。规范化后的临床报告对于熟悉叙述性临床报告所使用语言的医疗专业人员来说易于阅读。在第三阶段,用提取的结构化信息丰富规范化后的临床报告。第三阶段的最终结果是一个半结构化的规范化临床报告,其中提取的临床术语与码本术语相匹配。已开发出用于叙述性临床报告交互式校正、扩展和语义标注的软件工具,并在心脏病学领域对三阶段预处理方法进行了验证。
在心脏病学领域的49份匿名捷克叙述性临床报告上对三阶段预处理方法进行了验证。已计算出已完成转换数据库的描述性统计数据。两名心脏病专家参与了标注阶段。第一位心脏病专家使用ICD - 10、SNOMED CT、LOINC和LEKY分类系统将在49份叙述性临床报告中找到的1500个临床术语标注为码本术语。第二位心脏病专家对第一位心脏病专家的标注进行了验证。正确的临床术语和码本术语已存储在数据库中。
我们通过所提出的三阶段预处理方法从捷克叙述性临床报告中提取了结构化信息,并将其与电子健康记录相链接。该软件工具虽然是通用的,但针对作为所研究电子健康记录库特定语言的捷克语进行了定制。这将为把这种方法移植到其他几十种使用较少的语言提供一个潜在的标准。结构化信息可支持医疗决策、质量保证任务和进一步的医学研究。