University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA.
University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA.
J Clin Monit Comput. 2022 Apr;36(2):397-405. doi: 10.1007/s10877-021-00664-6. Epub 2021 Feb 8.
Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.
使用异构电子健康记录 (EHR) 数据进行大数据分析研究需要准确识别疾病表型病例和对照。过度依赖基于行政数据的真实情况确定可能导致有偏差和不准确的发现。由于其时间演变和可变的 EHR 记录,医院获得性静脉血栓栓塞症 (HA-VTE) 难以识别。为了为机器学习建模建立真实情况,我们比较了行政编码诊断的 HA-VTE 准确性与对黄金标准诊断测试结果的手动审查。我们对 3680 名成人过渡病房患者的 EHR 数据进行了回顾性分析,以确定 HA-VTE。确定了静脉血栓栓塞症的国际疾病分类,第九修订版 (ICD-9-CM) 代码。使用术语提取筛选了与 VTE 诊断测试相关的 4544 份放射学报告,然后由临床专家进行手动审查以确认诊断。在有 ICD-9-CM 编码的 VTE 病例中,有 219 例为急性发作型编码。测试报告审查确定了 158 例新发 HA-VTE 病例。只有 40%的 ICD-9-CM 编码病例(n=87)通过阳性诊断测试报告得到证实,其余大多数行政编码病例未经确认性诊断测试证实。此外,45%的诊断测试确认的 HA-VTE 病例缺乏相应的 ICD 代码。ICD-9-CM 编码错过了诊断测试确认的 HA-VTE 病例,并且不准确地分配了没有确认的 VTE 病例,这表明对行政编码的依赖导致了不准确的 HA-VTE 表型。需要开发更敏感和特异性的 VTE 表型解决方案,并在 EHR 供应商数据之间具有可移植性,以支持大数据分析中的病例发现。