Attal Kush, Charalambous Lefko, Di Gangi Catherine, Rozell Joshua C
Division of Adult Reconstruction, Department of Orthopedic Surgery, NYU Langone Health, New York, New York.
J Arthroplasty. 2025 Jun 6. doi: 10.1016/j.arth.2025.06.008.
Annotating free-text clinical notes into structured data are critical for future large-scale data analysis in institutional and national orthopaedic registries. In total hip arthroplasty (THA), classifying implant fixation, use of technology, and especially surgical approach are particularly difficult for classical machine-learning techniques. In this pilot, we evaluated the feasibility of GPT-4 to capture and justify these common elements in THA operative notes using a custom few-shot learning prompt.
The GPT-4 was trained with a few-shot learning approach using plain language descriptions of various fixations, technologies, and approaches, along with examples from gold-standard operative notes-four for fixation, 11 for technology, and 13 for surgical approach. The test set comprised 240 unique notes (60 for fixation, 90 for technology, and 120 for approach) from primary THAs performed by 38 surgeons at a single institution (November 2011 to March 2024). The GPT-4's output was compared against manual chart reviews for accuracy. The quality of clinical justifications was assessed using Flesch-Kincaid Grade Level scores for readability, self-BLEU scores for logical diversity, and character-level sequence matches with original notes.
The GPT-4 classified fixation, technology, and approach with an overall accuracy of 100, 98.9, and 97.5%, respectively. The model also provided justifications for classifications with average Flesch-Kincaid Grade Level scores of 17.9, 16.2, and 24.4 for fixation, technology, and approach, respectively, and average self-BLEU scores of < 0.1 each. Justifications had character-level sequence matches of 87.6, 89.2, and 96.5%, respectively, with direct note citations for fixation, technology, and approach.
Applying GPT-4 with a custom few-shot prompt to THA operative notes demonstrated excellent performance in capturing fixation, technology, and approach methods. Moreover, the model's ability to cite details from the original notes is critical for model validation before widespread adoption, exhibiting a promising alternative to manual chart review for clinical data capture.
将自由文本临床记录注释为结构化数据对于机构和国家骨科登记处未来的大规模数据分析至关重要。在全髋关节置换术(THA)中,对植入物固定、技术使用,尤其是手术入路进行分类对于传统机器学习技术来说特别困难。在本试点研究中,我们评估了GPT-4使用自定义少样本学习提示来捕捉和论证THA手术记录中这些常见元素的可行性。
使用各种固定方式、技术和入路的自然语言描述以及来自金标准手术记录的示例(固定方式4个、技术11个、手术入路13个),通过少样本学习方法对GPT-4进行训练。测试集包括来自一家机构38名外科医生进行的初次THA手术的240份独特记录(固定方式60份、技术90份、入路120份)(2011年11月至2024年3月)。将GPT-4的输出与人工病历审查的准确性进行比较。使用弗莱施-金凯德年级水平分数评估临床论证的质量以衡量可读性,使用自BLEU分数评估逻辑多样性,并评估与原始记录的字符级序列匹配情况。
GPT-4对固定方式、技术和入路的分类总体准确率分别为100%、98.9%和97.5%。该模型还为分类提供了论证,固定方式、技术和入路的平均弗莱施-金凯德年级水平分数分别为17.9、16.2和24.4,平均自BLEU分数均<0.1。论证与固定方式、技术和入路的直接记录引用的字符级序列匹配率分别为87.6%、89.2%和96.5%。
将带有自定义少样本提示的GPT-4应用于THA手术记录在捕捉固定方式、技术和入路方法方面表现出色。此外,该模型从原始记录中引用细节的能力对于在广泛采用之前进行模型验证至关重要,为临床数据捕捉的人工病历审查提供了一种有前景的替代方法。