Department of Orthopedic Surgery, University of California - San Francisco, San Francisco, California, USA.
Department of Orthopedic Surgery, University of Texas at Austin, Austin, Texas, USA.
Bone Joint J. 2020 Jul;102-B(7_Supple_B):99-104. doi: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1.
Natural Language Processing (NLP) offers an automated method to extract data from unstructured free text fields for arthroplasty registry participation. Our objective was to investigate how accurately NLP can be used to extract structured clinical data from unstructured clinical notes when compared with manual data extraction.
A group of 1,000 randomly selected clinical and hospital notes from eight different surgeons were collected for patients undergoing primary arthroplasty between 2012 and 2018. In all, 19 preoperative, 17 operative, and two postoperative variables of interest were manually extracted from these notes. A NLP algorithm was created to automatically extract these variables from a training sample of these notes, and the algorithm was tested on a random test sample of notes. Performance of the NLP algorithm was measured in Statistical Analysis System (SAS) by calculating the accuracy of the variables collected, the ability of the algorithm to collect the correct information when it was indeed in the note (sensitivity), and the ability of the algorithm to not collect a certain data element when it was not in the note (specificity).
The NLP algorithm performed well at extracting variables from unstructured data in our random test dataset (accuracy = 96.3%, sensitivity = 95.2%, and specificity = 97.4%). It performed better at extracting data that were in a structured, templated format such as range of movement (ROM) (accuracy = 98%) and implant brand (accuracy = 98%) than data that were entered with variation depending on the author of the note such as the presence of deep-vein thrombosis (DVT) (accuracy = 90%).
The NLP algorithm used in this study was able to identify a subset of variables from randomly selected unstructured notes in arthroplasty with an accuracy above 90%. For some variables, such as objective exam data, the accuracy was very high. Our findings suggest that automated algorithms using NLP can help orthopaedic practices retrospectively collect information for registries and quality improvement (QI) efforts. Cite this article: 2020;102-B(7 Supple B):99-104.
自然语言处理(NLP)提供了一种从关节置换登记参与的非结构化自由文本字段中提取数据的自动化方法。我们的目标是研究与手动数据提取相比,NLP 如何准确地从非结构化临床记录中提取结构化临床数据。
从 2012 年至 2018 年期间接受初次关节置换的 8 位不同外科医生的 1000 份随机选择的临床和住院病历中收集了一组数据。总共从这些记录中手动提取了 19 个术前、17 个手术和 2 个术后变量。创建了一个 NLP 算法,以从这些记录的一个训练样本中自动提取这些变量,并在随机测试样本的记录中测试该算法。在统计分析系统(SAS)中,通过计算所收集变量的准确性、算法在记录中确实包含正确信息时的能力(敏感性)以及算法在记录中未包含特定数据元素时的能力(特异性)来衡量 NLP 算法的性能。
NLP 算法在我们的随机测试数据集(准确性=96.3%,敏感性=95.2%,特异性=97.4%)中从非结构化数据中提取变量的性能良好。它在提取结构化、模板化格式的数据方面表现更好,例如活动范围(ROM)(准确性=98%)和植入物品牌(准确性=98%),而在提取依赖于记录作者的变化输入的数据方面表现较差,例如深静脉血栓形成(DVT)(准确性=90%)。
本研究中使用的 NLP 算法能够从关节置换术的随机选择的非结构化记录中识别出一组具有 90%以上准确性的变量。对于某些变量,例如客观检查数据,准确性非常高。我们的研究结果表明,使用 NLP 的自动化算法可以帮助骨科医生回顾性地为登记处和质量改进(QI)工作收集信息。