Afzal Naveed, Sohn Sunghwan, Abram Sara, Scott Christopher G, Chaudhry Rajeev, Liu Hongfang, Kullo Iftikhar J, Arruda-Olson Adelaide M
Department of Health Sciences Research, Mayo Clinic, Rochester, Minn.
Department of Cardiovascular Diseases, Mayo Clinic, Rochester, Minn.
J Vasc Surg. 2017 Jun;65(6):1753-1761. doi: 10.1016/j.jvs.2016.11.031. Epub 2017 Feb 8.
Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard.
We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets.
We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001).
A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.
下肢外周动脉疾病(PAD)极为常见,影响着全球数百万人。我们开发了一种自然语言处理(NLP)系统,用于从临床记录中自动确定PAD病例,并以踝臂指数测试结果作为金标准,将NLP算法的性能与计费代码算法进行比较。
我们将NLP算法的性能与以下各项进行比较:(1)金标准踝臂指数的结果;(2)基于相关国际疾病分类第九版诊断代码的先前验证算法(简单模型);以及(3)国际疾病分类第九版代码与程序代码的组合(完整模型)。将1569例PAD患者和对照的数据集随机分为训练子集(n = 935)和测试子集(n = 634)。
我们在训练集中对NLP算法进行了迭代优化,包括病历记录部分、记录类型和服务类型,以最大限度提高其准确性。在测试数据集中,与简单模型和完整模型相比,NLP算法具有更高的准确性(NLP为91.8%;完整模型为81.8%;简单模型为83%;P <.001)、阳性预测值(NLP为92.9%;完整模型为74.3%;简单模型为79.9%;P <.001)和特异性(NLP为92.5%;完整模型为64.2%;简单模型为75.9%;P <.001)。
一种用于从临床记录中自动确定PAD病例的知识驱动型NLP算法比计费代码算法具有更高的准确性。我们的研究结果凸显了NLP工具从电子健康记录中快速高效确定PAD病例的潜力,有助于临床研究,并最终通过临床决策支持改善医疗护理。