Tian Zhe, Sun Simon, Eguale Tewodros, Rochefort Christian M
*Department of Epidemiology, Biostatics and Occupational Health, Faculty of Medicine †McGill Clinical and Health Informatics Research Group, McGill University ‡Department of Radiology, McGill University Health Centre, Montreal, QC, Canada §Brigham & Women's Hospital, Boston, MA ∥Ingram School of Nursing, Faculty of Medicine, McGill University, Montreal, QC, Canada.
Med Care. 2017 Oct;55(10):e73-e80. doi: 10.1097/MLR.0000000000000346.
Surveillance of venous thromboembolisms (VTEs) is necessary for improving patient safety in acute care hospitals, but current detection methods are inaccurate and inefficient. With the growing availability of clinical narratives in an electronic format, automated surveillance using natural language processing (NLP) techniques may represent a better method.
We assessed the accuracy of using symbolic NLP for identifying the 2 clinical manifestations of VTE, deep vein thrombosis (DVT) and pulmonary embolism (PE), from narrative radiology reports.
A random sample of 4000 narrative reports was selected among imaging studies that could diagnose DVT or PE, and that were performed between 2008 and 2012 in a university health network of 5 adult-care hospitals in Montreal (Canada). The reports were coded by clinical experts to identify positive and negative cases of DVT and PE, which served as the reference standard. Using data from the largest hospital (n=2788), 2 symbolic NLP classifiers were trained; one for DVT, the other for PE. The accuracy of these classifiers was tested on data from the other 4 hospitals (n=1212).
On manual review, 663 DVT-positive and 272 PE-positive reports were identified. In the testing dataset, the DVT classifier achieved 94% sensitivity (95% CI, 88%-97%), 96% specificity (95% CI, 94%-97%), and 73% positive predictive value (95% CI, 65%-80%), whereas the PE classifier achieved 94% sensitivity (95% CI, 89%-97%), 96% specificity (95% CI, 95%-97%), and 80% positive predictive value (95% CI, 73%-85%).
Symbolic NLP can accurately identify VTEs from narrative radiology reports. This method could facilitate VTE surveillance and the evaluation of preventive measures.
监测静脉血栓栓塞症(VTE)对于提高急症医院患者安全至关重要,但目前的检测方法不准确且效率低下。随着电子格式临床记录的日益普及,利用自然语言处理(NLP)技术进行自动监测可能是一种更好的方法。
我们评估了使用符号NLP从叙述性放射学报告中识别VTE的两种临床表现,即深静脉血栓形成(DVT)和肺栓塞(PE)的准确性。
在2008年至2012年期间于加拿大蒙特利尔市五家成人护理医院组成的大学健康网络中进行的、可诊断DVT或PE的影像研究中,随机抽取4000份叙述性报告样本。由临床专家对报告进行编码,以识别DVT和PE的阳性和阴性病例,作为参考标准。利用最大医院的数据(n = 2788),训练了两个符号NLP分类器;一个用于DVT,另一个用于PE。在其他四家医院的数据(n = 1212)上测试了这些分类器的准确性。
经人工审核,识别出663份DVT阳性报告和272份PE阳性报告。在测试数据集中,DVT分类器的灵敏度为94%(95%CI,88%-97%),特异度为96%(95%CI,94%-97%),阳性预测值为73%(95%CI,65%-80%);而PE分类器的灵敏度为94%(95%CI,89%-97%),特异度为96%(95%CI,95%-97%),阳性预测值为80%(95%CI,73%-85%)。
符号NLP能够从叙述性放射学报告中准确识别VTE。该方法有助于VTE监测及预防措施评估。