Anetta Kristof, Horak Ales, Wojakowski Wojciech, Wita Krystian, Jadczyk Tomasz
Natural Language Processing Centre, Faculty of Informatics, Masaryk University, 602 00 Brno, Czech Republic.
Department of Cardiology and Structural Heart Diseases, School of Medicine in Katowice, Medical University of Silesia, 40-055 Katowice, Poland.
J Pers Med. 2022 May 25;12(6):869. doi: 10.3390/jpm12060869.
Electronic health records naturally contain most of the medical information in the form of doctor's notes as unstructured or semi-structured texts. Current deep learning text analysis approaches allow researchers to reveal the inner semantics of text information and even identify hidden consequences that can offer extra decision support to doctors. In the presented article, we offer a new automated analysis of Polish summary texts of patient hospitalizations. The presented models were found to be able to predict the final diagnosis with almost 70% accuracy based just on the patient's medical history (only 132 words on average), with possible accuracy increases when adding further sentences from hospitalization results; even one sentence was found to improve the results by 4%, and the best accuracy of 78% was achieved with five extra sentences. In addition to detailed descriptions of the data and methodology, we present an evaluation of the analysis using more than 50,000 Polish cardiology patient texts and dive into a detailed error analysis of the approach. The results indicate that the deep analysis of just the medical history summary can suggest the direction of diagnosis with a high probability that can be further increased just by supplementing the records with further examination results.
电子健康记录自然包含大部分医学信息,这些信息以医生笔记的形式呈现,为非结构化或半结构化文本。当前的深度学习文本分析方法使研究人员能够揭示文本信息的内在语义,甚至识别出可为医生提供额外决策支持的潜在结果。在本文中,我们对波兰语的患者住院摘要文本进行了新的自动化分析。研究发现,所提出的模型仅基于患者的病史(平均仅132个单词)就能以近70%的准确率预测最终诊断,若加入住院结果中的更多句子,准确率可能会提高;甚至发现加入一句话就能使结果提高4%,加入五句额外的句子时达到了最佳准确率78%。除了对数据和方法的详细描述外,我们还使用50000多篇波兰语心脏病患者文本对分析进行了评估,并深入探讨了该方法的详细误差分析。结果表明,仅对病史摘要进行深入分析就能大概率地提示诊断方向,通过补充进一步的检查结果记录可进一步提高该概率。