Krsnik Ivan, Glavaš Goran, Krsnik Marina, Miletić Damir, Štajduhar Ivan
Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia.
School of Business Informatics and Mathematics, University of Mannheim, 68159 Mannheim, Germany.
Diagnostics (Basel). 2020 Apr 1;10(4):196. doi: 10.3390/diagnostics10040196.
Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models-Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)-coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10-fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F 1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F 1 score of 86 . 7 % . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease ( 95 . 9 % ), arthrosis ( 93 . 3 % ), and injury ( 89 . 2 % ). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro-averaged F 1 scores of 84 . 6 % , 82 . 2 % , and 82 . 1 % , respectively.
只有当电子健康记录中的叙述性文本按照特定标准被正确自动解读时,才能有效地用于构建临床决策支持系统。本文探讨了开发一种自动标注自由格式放射学报告的方法这一问题,作为在医院构建具备查询功能的报告数据库的前奏。分析的数据集由1295份关于膝盖状况的放射学报告组成,这些报告是在克罗地亚里耶卡临床医院中心回顾性收集的。报告被手动标注了一组10种最常见临床病症中的一个或多个标签。在对文本进行初步预处理后,比较了两组文本分类方法:(1)传统分类模型——朴素贝叶斯(NB)、逻辑回归(LR)、支持向量机(SVM)和随机森林(RF)——与词袋(BoW)特征(即符号文本表示)相结合;(2)卷积神经网络(CNN)与密集词向量(即作为语义文本表示的词嵌入)相结合作为输入特征。我们采用嵌套10折交叉验证,使用准确率、精确率、召回率和F1分数来评估竞争方法的性能。以语义词表示作为输入的CNN产生了总体最佳性能,微平均F1分数为86.7%。对于最具代表性的病症,CNN分类器产生了特别令人鼓舞的结果:退行性疾病(95.9%)、关节病(93.3%)和损伤(89.2%)。然而,作为一个数据需求大的深度学习模型,CNN在训练实例较少的代表性不足的类别(如多病因疾病或代谢疾病)上的表现明显比竞争模型差。LR、RF和SVM的表现相当,获得的微平均F1分数分别为84.6%、82.2%和82.1%。