McPeek Hinz Eugenia R, Bastarache Lisa, Denny Joshua C
Department of Pediatrics Duke University Medical Center, Durham NC.
Departments of Biomedical Informatics, Nashville, TN.
AMIA Annu Symp Proc. 2013 Nov 16;2013:975-83. eCollection 2013.
Deep venous thrombosis and pulmonary embolism are diseases associated with significant morbidity and mortality. Known risk factors are attributed for only slight majority of venous thromboembolic disease (VTE) with the remainder of risk presumably related to unidentified genetic factors. We designed a general purpose Natural Language (NLP) algorithm to retrospectively capture both acute and historical cases of thromboembolic disease in a de-identified electronic health record. Applying the NLP algorithm to a separate evaluation set found a positive predictive value of 84.7% and sensitivity of 95.3% for an F-measure of 0.897, which was similar to the training set of 0.925. Use of the same algorithm on problem lists only in patients without VTE ICD-9s was found to be the best means of capturing historical cases with a PPV of 83%. NLP of VTE ICD-9 positive cases and non-ICD-9 positive problem lists provides an effective means for capture of both acute and historical cases of venous thromboembolic disease.
深静脉血栓形成和肺栓塞是与显著发病率和死亡率相关的疾病。已知的风险因素仅占静脉血栓栓塞性疾病(VTE)的一小部分,其余风险可能与未识别的遗传因素有关。我们设计了一种通用的自然语言(NLP)算法,以回顾性地在去识别化的电子健康记录中捕获血栓栓塞性疾病的急性和历史病例。将NLP算法应用于一个单独的评估集,发现其阳性预测值为84.7%,敏感性为95.3%,F值为0.897,这与训练集的0.925相似。在没有VTE ICD-9编码的患者中,仅对问题列表使用相同的算法被发现是捕获历史病例的最佳方法,其阳性预测值为83%。对VTE ICD-9阳性病例和非ICD-9阳性问题列表进行NLP,为捕获静脉血栓栓塞性疾病的急性和历史病例提供了一种有效方法。