St. Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada; Department of Medicine, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.
Department of Medicine, University of Toronto, Toronto, ON, Canada.
Thromb Res. 2022 Jan;209:51-58. doi: 10.1016/j.thromres.2021.11.020. Epub 2021 Nov 27.
Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement.
To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients.
This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published "simpleNLP" tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the "gold standard" manual review in a separate random sample of 4000 GIM hospitalizations.
Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92).
Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets).
从大型临床和管理数据库中识别静脉血栓栓塞症(VTE)对于研究和质量改进非常重要。
开发和验证自然语言处理(NLP)算法,以从多伦多 5 家医院的综合内科(GIM)住院患者的放射学报告中识别 VTE。
这项横断面研究纳入了 2010 年 4 月 1 日至 2017 年 3 月 31 日期间在加拿大安大略省多伦多的 5 家医院的 GIM 住院患者。我们开发了 NLP 算法,以从胸部计算机断层扫描(CT)、四肢压缩超声(US)和核通气-灌注(VQ)扫描的放射科报告中识别肺栓塞(PE)和深静脉血栓形成(DVT),该算法在 1551 例住院患者的训练数据集中进行了验证。我们比较了 NLP 算法、之前发表的“simpleNLP”工具以及行政出院诊断代码(ICD-10-CA)在 4000 例 GIM 住院患者的独立随机样本中对 PE 和 DVT 的准确性,以与“金标准”手动审查进行比较。
我们的 NLP 算法对 US 识别 DVT 的准确性很高,其敏感性为 0.94,阳性预测值(PPV)为 0.90,受试者工作特征曲线下面积(AUC)为 0.96;对 CT 识别 PE 的敏感性为 0.91,PPV 为 0.89,AUC 为 0.96。行政诊断代码和 simpleNLP 工具对 DVT 的准确性较低(ICD-10-CA 的敏感性为 0.63,PPV 为 0.43,AUC 为 0.81;simpleNLP 的敏感性为 0.41,PPV 为 0.36,AUC 为 0.66)和 PE(ICD-10-CA 的敏感性为 0.83,PPV 为 0.70,AUC 为 0.91;simpleNLP 的敏感性为 0.89,PPV 为 0.62,AUC 为 0.92)。
行政诊断代码在识别住院患者的 VTE 方面不可靠。我们开发了高度准确的 NLP 算法,可从多中心样本的放射学报告中识别 VTE,并通过易于使用的工具(https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets)向学术界免费提供这些算法。