Comprehensive Heart Failure Center (CHFC), Department of Internal Medicine I, Würzburg University Hospital, Am Schwarzenberg 15, 97078, Würzburg, Germany.
Chair of Computer Science VI, University of Würzburg, Würzburg, Germany.
Clin Res Cardiol. 2018 Sep;107(9):778-787. doi: 10.1007/s00392-018-1245-z. Epub 2018 Apr 17.
Heart failure is the predominant cause of hospitalization and amongst the leading causes of death in Germany. However, accurate estimates of prevalence and incidence are lacking. Reported figures originating from different information sources are compromised by factors like economic reasons or documentation quality.
We implemented a clinical data warehouse that integrates various information sources (structured parameters, plain text, data extracted by natural language processing) and enables reliable approximations to the real number of heart failure patients. Performance of ICD-based diagnosis in detecting heart failure was compared across the years 2000-2015 with (a) advanced definitions based on algorithms that integrate various sources of the hospital information system, and (b) a physician-based reference standard.
Applying these methods for detecting heart failure in inpatients revealed that relying on ICD codes resulted in a marked underestimation of the true prevalence of heart failure, ranging from 44% in the validation dataset to 55% (single year) and 31% (all years) in the overall analysis. Percentages changed over the years, indicating secular changes in coding practice and efficiency. Performance was markedly improved using search and permutation algorithms from the initial expert-specified query (F1 score of 81%) to the computer-optimized query (F1 score of 86%) or, alternatively, optimizing precision or sensitivity depending on the search objective.
Estimating prevalence of heart failure using ICD codes as the sole data source yielded unreliable results. Diagnostic accuracy was markedly improved using dedicated search algorithms. Our approach may be transferred to other hospital information systems.
心力衰竭是德国住院治疗的主要原因,也是导致死亡的主要原因之一。然而,目前缺乏对其患病率和发病率的准确估计。来自不同信息源的报告数据受到经济原因或文档质量等因素的影响。
我们实施了一个临床数据仓库,该仓库整合了各种信息源(结构化参数、纯文本、自然语言处理提取的数据),并能够可靠地估算心力衰竭患者的真实数量。将基于 ICD 的诊断方法在 2000 年至 2015 年期间检测心力衰竭的性能与以下方法进行了比较:(a)基于整合医院信息系统各种来源的算法的高级定义;(b)基于医生的参考标准。
应用这些方法对住院患者进行心力衰竭检测,发现仅依赖 ICD 代码会显著低估心力衰竭的真实患病率,从验证数据集中的 44%到总体分析中的 55%(单一年份)和 31%(所有年份)。这些百分比随着时间的推移而变化,表明编码实践和效率的时代变化。使用初始专家指定查询的搜索和排列算法(F1 得分为 81%)到计算机优化查询(F1 得分为 86%),或者根据搜索目标优化精度或灵敏度,都可以显著提高性能。
仅使用 ICD 代码作为唯一数据源来估计心力衰竭的患病率会产生不可靠的结果。使用专用搜索算法可以显著提高诊断准确性。我们的方法可以应用于其他医院信息系统。