Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, PV, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, PV, Italy.
Int J Med Inform. 2018 Mar;111:140-148. doi: 10.1016/j.ijmedinf.2017.12.013. Epub 2017 Dec 23.
In this work, we propose an ontology-driven approach to identify events and their attributes from episodes of care included in medical reports written in Italian. For this language, shared resources for clinical information extraction are not easily accessible.
The corpus considered in this work includes 5432 non-annotated medical reports belonging to patients with rare arrhythmias. To guide the information extraction process, we built a domain-specific ontology that includes the events and the attributes to be extracted, with related regular expressions. The ontology and the annotation system were constructed on a development set, while the performance was evaluated on an independent test set. As a gold standard, we considered a manually curated hospital database named TRIAD, which stores most of the information written in reports.
The proposed approach performs well on the considered Italian medical corpus, with a percentage of correct annotations above 90% for most considered clinical events. We also assessed the possibility to adapt the system to the analysis of another language (i.e., English), with promising results.
Our annotation system relies on a domain ontology to extract and link information in clinical text. We developed an ontology that can be easily enriched and translated, and the system performs well on the considered task. In the future, it could be successfully used to automatically populate the TRIAD database.
在这项工作中,我们提出了一种基于本体的方法,从用意大利语书写的医疗报告中的护理记录中识别事件及其属性。对于这种语言,临床信息提取的共享资源不容易获得。
本研究中使用的语料库包括 5432 份未注释的医疗报告,涉及患有罕见心律失常的患者。为了指导信息提取过程,我们构建了一个特定于该领域的本体,其中包含要提取的事件和属性,并带有相关的正则表达式。本体和注释系统是在一个开发集上构建的,而性能则在一个独立的测试集上进行评估。作为黄金标准,我们考虑了一个名为 TRIAD 的手动管理的医院数据库,该数据库存储了报告中大部分信息。
所提出的方法在考虑的意大利医疗语料库上表现良好,对于大多数考虑的临床事件,正确注释的百分比都在 90%以上。我们还评估了将该系统应用于另一种语言(即英语)分析的可能性,结果令人鼓舞。
我们的注释系统依赖于一个领域本体来提取和链接临床文本中的信息。我们开发了一个易于丰富和翻译的本体,该系统在考虑的任务中表现良好。将来,它可以成功地用于自动填充 TRIAD 数据库。